Commit Graph

18195 Commits

Author SHA1 Message Date
Sean Farley
311a50fdae patch: use opt.showsimilarity to calculate and show the similarity
Tests have been added.
2017-01-09 11:24:18 -08:00
Sean Farley
bf5e8cb800 patch: add similarity config knob in experimental section
This config knob will control whether or not to show the similarity
calculation in the diff output:

  diff --git a/README.md b/foo.md
  similarity index 88%
  rename from README.md
  rename to foo.md
  --- a/README.md
  +++ b/foo.md
2017-01-09 10:51:44 -08:00
Sean Farley
8fc2b48eb5 similar: move score function to module level
Future patches will use this to report the similarity of a rename / copy
in the patch output.
2017-01-07 20:47:57 -08:00
Yuya Nishihara
5ade140d5c revset: abuse x:y syntax to specify line range of followlines()
This slightly complicates the parsing (see the previous patch), but the
overall result seems not bad.

I keep x:, :y and : for future extension.
2017-01-09 17:58:19 +09:00
Yuya Nishihara
615f3c1669 revset: do not transform range* operators in parsed tree
This allows us to handle x:y range as a general range object. A primary user
of it is followlines().
2017-01-09 16:55:56 +09:00
Yuya Nishihara
0f4a24bbbf revset: add default value to getinteger() helper
This seems handy.
2017-01-09 17:45:11 +09:00
Yuya Nishihara
49d42c696d revset: factor out getinteger() helper
We have 4 revset functions that take integer arguments, and they handle
their arguments in slightly different ways. This patch unifies them:

 - getstring() in place of getsymbol(), which is more consistent with the
   handling of integer revisions (both 1 and '1' are valid)
 - say "expects" instead of "requires" for type errors

We don't need to catch TypeError since getstring() must return a string.
2017-01-09 17:39:44 +09:00
Yuya Nishihara
a73b0aaf6b revset: rename rev argument of followlines() to startrev
The rev argument has the same meaning as startrev of follow(), and I think
startrev is more informative.

followlines() is new function, we can make BC now.
2017-01-09 16:16:26 +09:00
Yuya Nishihara
a0c3bc199a help: use :hg: role and canonical name to point to revset string patterns
Follows up ae418afed3f6. Now revisions.txt and revsets.txt has been merged,
so use revisions.* as a pointer.
2017-01-13 23:48:21 +09:00
Gregory Szorc
4a3b8df214 util: compression APIs to support revlog decompression
Previously, compression engines had APIs for performing revlog
compression but no mechanism to perform revlog decompression. This
patch changes that.

Revlog decompression is slightly more complicated than compression
because in the compression case there is (currently) only a single
engine that can be used at a time. However for decompression, a
revlog could contain chunks from multiple compression engines. This
means decompression needs to map to multiple engines and
decompressors. This functionality is outside the scope of this patch.
But it drives the decision for engines to declare a byte header
sequence that identifies revlog data as belonging to an engine and
an API for obtaining an engine from a revlog header.
2017-01-02 13:27:20 -08:00
Anton Shestakov
9427025e13 crecord: add an experimental option for space key to move cursor down
I really want to have an option of toggling a selection on a line and also
moving cursor down as a single keystroke. It also kinda makes sense for space
key to do this, because some other curses UIs in the wild do this (e.g. various
file managers, htop). So I got an idea to make a config option that defaults to
False for compatibility, but allows making crecord UI a lot more useful for
people with big hunks.

We add this an experimental option to experiment with this behavior.
2017-01-08 10:08:29 +08:00
Gregory Szorc
24c1205d69 revlog: use compression engine API for compression
This commit swaps in the just-added revlog compressor API into
the revlog class.

Instead of implementing zlib compression inline in compress(), we
now store a cached-on-first-use revlog compressor on each revlog
instance and invoke its "compress()" method.

As part of this, revlog.compress() has been refactored a bit to use
a cleaner code flow and modern formatting (e.g. avoiding
parenthesis around returned tuples).

On a mozilla-unified repo, here are the "compress" times for a few
commands:

$ hg perfrevlogchunks -c
! wall 5.772450 comb 5.780000 user 5.780000 sys 0.000000 (best of 3)
! wall 5.795158 comb 5.790000 user 5.790000 sys 0.000000 (best of 3)

$ hg perfrevlogchunks -m
! wall 9.975789 comb 9.970000 user 9.970000 sys 0.000000 (best of 3)
! wall 10.019505 comb 10.010000 user 10.010000 sys 0.000000 (best of 3)

Compression times did seem to slow down just a little. There are
360,210 changelog revisions and 359,342 manifest revisions. For the
changelog, mean time to compress a revision increased from ~16.025us to
~16.088us. That's basically a function call or an attribute lookup. I
suppose this is the price you pay for abstraction. It's so low that
I'm not concerned.
2017-01-02 11:22:52 -08:00
Gregory Szorc
29c30e4b7e util: compression APIs to support revlog compression
As part of "zstd all of the things," we need to teach revlogs to
use non-zlib compression formats. Because we're routing all compression
via the "compression manager" and "compression engine" APIs, we need to
introduction functionality there for performing revlog operations.

Ideally, revlog compression and decompression operations would be
implemented in terms of simple "compress" and "decompress" primitives.
However, there are a few considerations that make us want to have a
specialized primitive for handling revlogs:

1) Performance. Revlogs tend to do compression and especially
   decompression operations in batches. Any overhead for e.g.
   instantiating a "context" for performing an operation can be
   noticed. For this reason, our "revlog compressor" primitive is
   reusable. For zstd, we reuse the same compression "context" for
   multiple operations. I've measured this to have a performance
   impact versus constructing new contexts for each operation.

2) Specialization. By having a primitive dedicated to revlog use,
   we can make revlog-specific choices and leave the door open for
   more functionality in the future. For example, the zstd revlog
   compressor may one day make use of dictionary compression.

A future patch will introduce a decompress() on the compressor
object.

The code for the zlib compressor is basically copied from
revlog.compress(). Although it doesn't handle the empty input
case, the null first byte case, and the 'u' prefix case. These
cases will continue to be handled in revlog.py once that code is
ported to use this API.
2017-01-02 12:39:03 -08:00
Gregory Szorc
1a6670d670 revlog: move decompress() from module to revlog class (API)
Upcoming patches will convert revlogs to use the compression engine
APIs to perform all things compression. The yet-to-be-introduced
APIs support a persistent "compressor" object so the same object
can be reused for multiple compression operations, leading to
better performance. In addition, compression engines like zstd
may wish to tweak compression engine state based on the revlog
(e.g. per-revlog compression dictionaries).

A global and shared decompress() function will shortly no longer
make much sense. So, we move decompress() to be a method of the
revlog class. It joins compress() there.

On the mozilla-unified repo, we can measure the impact of this change
on reading performance:

$ hg perfrevlogchunks -c
! chunk
! wall 1.932573 comb 1.930000 user 1.900000 sys 0.030000 (best of 6)
! wall 1.955183 comb 1.960000 user 1.930000 sys 0.030000 (best of 6)
! chunk batch
! wall 1.787879 comb 1.780000 user 1.770000 sys 0.010000 (best of 6
! wall 1.774444 comb 1.770000 user 1.750000 sys 0.020000 (best of 6)

"chunk" appeared to become slower but "chunk batch" got faster. Upon
further examination by running both sets multiple times, the numbers
appear to converge across all runs. This tells me that there is no
perceived performance impact to this refactor.
2017-01-02 13:00:16 -08:00
Gregory Szorc
df8167ed29 revlog: make compressed size comparisons consistent
revlog.compress() compares the compressed size to the input size
and throws away the compressed data if it is larger than the input.
This is the correct thing to do, as storing compressed data that
is larger than the input takes up more storage space and makes reading
slower.

However, the comparison was implemented inconsistently. For the
streaming compression mode, we threw away the result if it was
greater than or equal to the input size. But for the one-shot
compression, we threw away the compression only if it was greater
than the input size!

This patch changes the comparison for the simple case so it is
consistent with the streaming case.

As a few tests demonstrate, this adds 1 byte to some revlog entries.
This is because of an added 'u' header on the chunk. It seems
somewhat wrong to increase the revlog size here. However, IMO the cost
of 1 byte in storage is insignificant compared to the performance gains
of avoiding decompression. This patch should invite questions around
the heuristic for throwing away compressed data. For example, I'd argue
we should be more liberal about rejecting compressed data, additionally
doing so where the number of bytes saved fails to reach a threshold.
But we can have this discussion another time.
2017-01-02 11:50:17 -08:00
Sean Farley
3c1cbd7c9b similar: rename local variable to not collide with previous
Future patches will move the score function to the module level, so
let's not shadow that.
2017-01-07 20:43:49 -08:00
Sean Farley
25acd53e01 patch: add label for coloring the index extended header
Just like the summary says, this will colorize the:

  index 3d3ba4b65e11..57274a0f46b2 100644

line in the diff output.
2017-01-09 10:59:45 -08:00
Sean Farley
14adabd19a patch: add index line for diff output
This helps highlighting in third-party diff coloring (which assumes git
output) and maintains pedantic correctness with diff --git.

Tests will be added at the end of the series.
2016-12-31 15:41:57 -06:00
Sean Farley
8cd1b5827c patch: add config knob for displaying the index header
This config knob can take an integer between 0 and 40 or a
keyword ('none', 'short', 'full') to control the length of hash to
output. It will display diffs with the git index header as such,

  diff --git a/mercurial/mdiff.py b/mercurial/mdiff.py
  index 112edf7..d6b52c5 100644

We'll put this in the experimental section for now.
2017-01-09 11:13:47 -08:00
Martin von Zweigbergk
9e63f2d21c bisect: refer directly to bisect() revset predicate in help
We have specific syntax for displaying the help text for a particular
revset predicate, so let's refer directly to the bisect() revset in
the verbose bisect help. It seems likely that the user doesn't care
about other revsets at that point, so they will probably not miss the
text about the other revset predicates.
2017-01-12 12:05:23 -08:00
Martin von Zweigbergk
029203f29d help: remove now-redundant pointer to revsets help
"hg help revisions" and "hg help revsets" now point to the same text,
so drop the revsets reference.
2017-01-12 11:52:05 -08:00
Matt Harbison
d3bfb5a06a help: eliminate duplicate text for revset string patterns
There's no reason to duplicate this so many times, and it's likely an instance
will be missed if support for a new pattern is added and documented.  The
stringmatcher is mostly used by revsets, though it is also used for the 'tag'
related templates, and namespace filtering in the journal extension.  So maybe
there's a better place to document it.  `hg help patterns` seems inappropriate,
because that is all file pattern matching.

While here, indicate how to perform case insensitive regex searches.
2017-01-07 23:35:35 -05:00
Matt Harbison
e0b76f5323 revset: add regular expression support to 'desc'
This is a case insensitive predicate like 'author', so it conforms to the
existing behavior of performing a case insensitive regex.
2017-01-07 21:26:32 -05:00
Matt Harbison
840ab22fff revset: stop lowercasing the regex pattern for 'author'
It was probably unintentional for regex, as the meaning of some sequences like
\S and \s is actually inverted by changing the case.  For backward compatibility
however, the matching is forced to case insensitive.
2017-01-11 22:42:10 -05:00
Gregory Szorc
abe1c0e17e repair: clean up stale lock file from store backup
Since we did a directory rename on the stores, the source
repository's lock path now references the dest repository's
lock path and the dest repository's lock path now references
a non-existent filename.

So releasing the lock on the source will unlock the dest and
releasing the lock on the dest will no-op because it fails due
to file not found. So we clean up the dest's lock manually.
2016-11-24 18:45:29 -08:00
Gregory Szorc
a400e3d753 repair: copy non-revlog store files during upgrade
The store contains more than just revlogs. This patch teaches the
upgrade code to copy regular files as well.

As the test changes demonstrate, the phaseroots file is now copied.
2016-11-24 18:34:50 -08:00
Gregory Szorc
93504084a0 repair: migrate revlogs during upgrade
Our next step for in-place upgrade is to migrate store data. Revlogs
are the biggest source of data within the store and a store is useless
without them, so we implement their migration first.

Our strategy for migrating revlogs is to walk the store and call
`revlog.clone()` on each revlog. There are some minor complications.

Because revlogs have different storage options (e.g. changelog has
generaldelta and delta chains disabled), we need to obtain the
correct class of revlog so inserted data is encoded properly for its
type.

Various attempts at implementing progress indicators that didn't lead
to frustration from false "it's almost done" indicators were made.

I initially used a single progress bar based on number of revlogs.
However, this quickly churned through all filelogs, got to 99% then
effectively froze at 99.99% when it got to the manifest.

So I converted the progress bar to total revision count. This was a
little bit better. But the manifest was still significantly slower
than filelogs and it took forever to process the last few percent.

I then tried both revision/chunk bytes and raw bytes as the
denominator. This had the opposite effect: because so much data is in
manifests, it would churn through filelogs without showing much
progress. When it got to manifests, it would fill in 90+% of the
progress bar.

I finally gave up having a unified progress bar and instead implemented
3 progress bars: 1 for filelog revisions, 1 for manifest revisions, and
1 for changelog revisions. I added extra messages indicating the total
number of revisions of each so users know there are more progress bars
coming.

I also added extra messages before and after each stage to give extra
details about what is happening. Strictly speaking, this isn't
necessary. But the numbers are impressive. For example, when converting
a non-generaldelta mozilla-central repository, the messages you see are:

   migrating 2475593 total revisions (1833043 in filelogs, 321156 in manifests, 321394 in changelog)
   migrating 1.67 GB in store; 2508 GB tracked data
   migrating 267868 filelogs containing 1833043 revisions (1.09 GB in store; 57.3 GB tracked data)
   finished migrating 1833043 filelog revisions across 267868 filelogs; change in size: -415776 bytes
   migrating 1 manifests containing 321156 revisions (518 MB in store; 2451 GB tracked data)

That "2508 GB" figure really blew me away. I had no clue that the raw
tracked data in mozilla-central was that large. Granted, 2451 GB is in
the manifest and "only" 57.3 GB is in filelogs. But still.

It's worth noting that gratuitous loading of source revlogs in order
to display numbers and progress bars does serve a purpose: it ensures
we can open all source revlogs. We don't want to spend several minutes
copying revlogs only to encounter a permissions error or similar later.

As part of this commit, we also add swapping of the store directory
to the upgrade function. After revlogs are converted, we move the
old store into the backup directory then move the temporary repo's
store into the old store's location. On well-behaved systems, this
should be 2 atomic operations and the window of inconsistency show be
very narrow.

There are still a few improvements to be made to store copying and
upgrading. But this commit gets the bulk of the work out of the way.
2016-12-18 17:00:15 -08:00
Gregory Szorc
4dbc7459c8 revlog: add clone method
Upcoming patches will introduce functionality for in-place
repository/store "upgrades." Copying the contents of a revlog
feels sufficiently low-level to warrant being in the revlog
class. So this commit implements that functionality.

Because full delta recomputation can be *very* expensive (we're
talking several hours on the Firefox repository), we support
multiple modes of execution with regards to delta (re)use. This
will allow repository upgrades to choose the "level" of
processing/optimization they wish to perform when converting
revlogs.

It's not obvious from this commit, but "addrevisioncb" will be
used for progress reporting.
2016-12-18 17:02:57 -08:00
Gregory Szorc
b9b6954ea9 repair: begin implementation of in-place upgrading
Now that all the upgrade planning work is in place, we can start
doing the real work: actually upgrading a repository.

The main goal of this commit is to get the "framework" for running
in-place upgrade actions in place.

Rather than get too clever and low-level with regards to in-place
upgrades, our strategy is to create a new, temporary repository,
copy data to it, then replace the old data with the new. This allows
us to reuse a lot of code in localrepo.py around store interaction,
which will eventually consume the bulk of the upgrade code.

But we have to start small. This patch implements adding new
repository requirements. But it still sets up a temporary
repository and locks it and the source repo before performing the
requirements file swap. This means all the plumbing is in place
to implement store copying in subsequent commits.
2016-12-18 16:59:04 -08:00
Gregory Szorc
a3569d4b71 repair: determine what upgrade will do
This commit introduces code for determining what actions/improvements
an upgrade should perform.

The "upgradefindimprovements" function introduces a mechanism to
return a list of improvements that can be made to a repository.
Each improvement is effectively an action that an upgrade will
perform. Associated with each of these improvements is metadata
that will be used to inform users what's wrong and what an
upgrade will do.

Each "improvement" is categorized as a "deficiency" or an
"optimization." TBH, I'm not thrilled about the terminology and
am receptive to constructive bikeshedding. The main difference
between a "deficiency" and an "optimization" is a deficiency
is always corrected (if it deviates from the current config) and
an "optimization" is an optional action that goes above and beyond
to improve the state of the repository (usually by requiring more
CPU during upgrade).

Our initial set of improvements identifies missing repository
requirements, a single, easily correctable problem with
changelog storage, and a set of "optimizations" related to delta
recalculation.

The main "upgraderepo" function has been expanded to handle
improvements. It queries for the list of improvements and determines
which of them will run based on the current repository state and user

I went through numerous iterations of the output format before
settling on a ReST-inspired definition list format. (I used
bulleted lists in the first submission of this commit and could
not get it to format just right.) Even with the various iterations,
I'm still not super thrilled with the format. But, this is a debug*
command, so that should mean we can refine the output without BC
concerns.
2016-12-18 16:51:09 -08:00
Gregory Szorc
f42e2dcaac repair: implement requirements checking for upgrades
This commit introduces functionality for upgrading a repository in
place. The first part that's implemented is testing for upgrade
"compatibility." This is done by examining repository requirements.

There are 5 functions returning sets of requirements that control
upgrading. Why so many functions? Mainly to support extensions.
Functions are easier to monkeypatch than module variables.

Astute readers will see that we don't support "manifestv2" and
"treemanifest" requirements in the upgrade mechanism. I don't have
a great answer for why other than this is a complex set of patches
and I don't want to deal with the complexity of these experimental
features just yet. We can teach the upgrade mechanism about them
later, once the basic upgrade mechanism is in place.

This commit also introduces the "upgraderepo" function. This will be
our main routine for performing an in-place upgrade. Currently, it
just implements requirements checking. The structure of some code in
this function may look a bit weird (e.g. the inline function that is
only called once). But this will make sense after future commits.
2016-12-18 16:16:54 -08:00
Gregory Szorc
16568ee7f0 debugcommands: stub for debugupgraderepo command
Currently, if Mercurial introduces a new repository/store feature or
changes behavior of an existing feature, users must perform an
`hg clone` to create a new repository with hopefully the
correct/optimal settings. Unfortunately, even `hg clone` may not
give the correct results. For example, if you do a local `hg clone`,
you may get hardlinks to revlog files that inherit the old state.
If you `hg clone` from a remote or `hg clone --pull`, changegroup
application may bypass some optimization, such as converting to
generaldelta.

Optimizing a repository is harder than it seems and requires more
than a simple `hg` command invocation.

This commit starts the process of changing that. We introduce
`hg debugupgraderepo`, a command that performs an in-place upgrade
of a repository to use new, optimal features. The command is just
a stub right now. Features will be added in subsequent commits.

This commit does foreshadow some of the behavior of the new command,
notably that it doesn't do anything by default and that it takes
arguments that influence what actions it performs. These will be
explained more in subsequent commits.
2016-11-24 16:24:09 -08:00
Matt Harbison
86e0681833 util: teach stringmatcher to handle forced case insensitive matches
The 'author' and 'desc' revsets are documented to be case insensitive.
Unfortunately, this was implemented in 'author' by forcing the input to
lowercase, including for regex like '\B'.  (This actually inverts the meaning of
the sequence.)  For backward compatibility, we will keep that a case insensitive
regex, but by using matcher options instead of brute force.

This doesn't preclude future hypothetical 'icase-literal:' style prefixes that
can be provided by the user.  Such user specified cases can probably be handled
up front by stripping 'icase-', setting the variable, and letting it drop
through the existing code.
2017-01-11 21:47:19 -05:00
Matt Harbison
762a49215b revset: point to 'grep' in the 'keyword' help for regex searches
The help for 'grep' already points to 'keyword'.
2017-01-11 23:13:51 -05:00
Martin von Zweigbergk
43fb09f0d0 help: explain that revsets can be used where 1 or 2 revs are wanted
We did not seem to document that one can do things like "hg up :@"
where the last revision of the revset ":@".
2017-01-11 23:13:00 -08:00
Martin von Zweigbergk
1333ea0016 help: explain what the term "revset" means
We refer to revsets in a few places (e.g. in "hg help config"), but we
never explained what they are. Until now.
2017-01-11 22:46:07 -08:00
Martin von Zweigbergk
1840263a8c help: merge revsets.txt into revisions.txt
Selecting single and multiple revisions is closely related, so let's
put it in one place, so users can easily find it. We actually did not
even point to "hg help revsets" from "hg help revisions", but now that
they're on a single page, that won't be necessary.
2017-01-11 11:37:38 -08:00
Martin von Zweigbergk
01bab7fc55 help: use a single paragraph to describe full and abbreviated nodeids
The texts describing 40-digit strings and the abbreviated form are
closely related, so make it a single paragraph.
2017-01-11 11:28:54 -08:00
Gregory Szorc
9849c580fb hgweb: support Content Security Policy
Content-Security-Policy (CSP) is a web security feature that allows
servers to declare what loaded content is allowed to do. For example,
a policy can prevent loading of images, JavaScript, CSS, etc unless
the source of that content is whitelisted (by hostname, URI scheme,
hashes of content, etc). It's a nifty security feature that provides
extra mitigation against some attacks, notably XSS.

Mitigation against these attacks is important for Mercurial because
hgweb renders repository data, which is commonly untrusted. While we
make attempts to escape things, etc, there's the possibility that
malicious data could be injected into the site content. If this happens
today, the full power of the web browser is available to that
malicious content. A restrictive CSP policy (defined by the server
operator and sent in an HTTP header which is outside the control of
malicious content), could restrict browser capabilities and mitigate
security problems posed by malicious data.

CSP works by emitting an HTTP header declaring the policy that browsers
should apply. Ideally, this header would be emitted by a layer above
Mercurial (likely the HTTP server doing the WSGI "proxying"). This
works for some CSP policies, but not all.

For example, policies to allow inline JavaScript may require setting
a "nonce" attribute on <script>. This attribute value must be unique
and non-guessable. And, the value must be present in the HTTP header
and the HTML body. This means that coordinating the value between
Mercurial and another HTTP server could be difficult: it is much
easier to generate and emit the nonce in a central location.

This commit introduces support for emitting a
Content-Security-Policy header from hgweb. A config option defines
the header value. If present, the header is emitted. A special
"%nonce%" syntax in the value triggers generation of a nonce and
inclusion in <script> elements in templates. The inclusion of a
nonce does not occur unless "%nonce%" is present. This makes this
commit completely backwards compatible and the feature opt-in.

The nonce is a type 4 UUID, which is the flavor that is randomly
generated. It has 122 random bits, which should be plenty to satisfy
the guarantees of a nonce.
2017-01-10 23:37:08 -08:00
Gregory Szorc
49f189afa0 hgweb: call process_dates() via DOM event listener
All the hgweb templates include mercurial.js in their header. All
the hgweb templates have the same <script> boilerplate to run
process_dates(). This patch factors that function call into
mercurial.js as part of a DOMContentLoaded event listener.
2017-01-10 20:47:48 -08:00
Gregory Szorc
f71c86b7e9 protocol: send application/mercurial-0.2 responses to capable clients
With this commit, the HTTP transport now parses the X-HgProto-<N>
header to determine what media type and compression engine to use for
responses. So far, we only compress responses that are already being
compressed with zlib today (stream response types to specific
commands). We can expand things to cover additional response types
later.

The practical side-effect of this commit is that non-zlib compression
engines will be used if both ends support them. This means if both
ends have zstd support, zstd - not zlib - will be used to compress
data!

When cloning the mozilla-unified repository between a local HTTP
server and client, the benefits of non-zlib compression are quite
noticeable:

  engine     server CPU (s)   client CPU (s)    bundle size
zlib (l=6)      174.1            283.2         1,148,547,026
zstd (l=1)       99.2            267.3         1,127,513,841
zstd (l=3)      103.1            266.9         1,018,861,363
zstd (l=7)      128.3            269.7           919,190,278
zstd (l=10)     162.0               -            894,547,179
none             95.3            277.2         4,097,566,064

The default zstd compression level is 3. So if you deploy zstd
capable Mercurial to your clients and servers and CPU time on
your server is dominated by "getbundle" requests (clients cloning
and pulling) - and my experience at Mozilla tells me this is often
the case - this commit could drastically reduce your server-side
CPU usage *and* save on bandwidth costs!

Another benefit of this change is that server operators can install
*any* compression engine. While it isn't enabled by default, the
"none" compression engine can now be used to disable wire protocol
compression completely. Previously, commands like "getbundle" always
zlib compressed output, adding considerable overhead to generating
responses. If you are on a high speed network and your server is under
high load, it might be advantageous to trade bandwidth for CPU.
Although, zstd at level 1 doesn't use that much CPU, so I'm not
convinced that disabling compression wholesale is worthwhile. And, my
data seems to indicate a slow down on the client without compression.
I suspect this is due to a lack of buffering resulting in an increase
in socket read() calls and/or the fact we're transferring an extra 3 GB
of data (parsing HTTP chunked transfer and processing extra TCP packets
can add up). This is definitely worth investigating and optimizing. But
since the "none" compressor isn't enabled by default, I'm inclined to
punt on this issue.

This commit introduces tons of tests. Some of these should arguably
have been implemented on previous commits. But it was difficult to
test without the server functionality in place.
2016-12-24 15:29:32 -07:00
Gregory Szorc
e1840d5435 httppeer: advertise and support application/mercurial-0.2
Now that servers expose a capability indicating they support
application/mercurial-0.2 and compression, clients can key off
this to say they support responses that are compressed with
various compression formats.

After this commit, the HTTP wire protocol client now sends an
"X-HgProto-<N>" request header indicating its support for
"application/mercurial-0.2" media type and various compression
formats.

This commit also implements support for handling
"application/mercurial-0.2" responses. It simply reads the header
compression engine identifier then routes the remainder of the
response to the appropriate decompressor.

There were some test changes, but only to logging. That points to
an obvious gap in our test coverage. This will be addressed in a
subsequent commit once server support is in place (it is hard to
test without server support).
2016-12-24 15:22:18 -07:00
Gregory Szorc
a95fb0b61b wireproto: advertise supported media types and compression formats
This commit introduces support for advertising a server's support for
media types and compression formats in accordance with the spec defined
in internals.wireproto.

The bulk of the new code is a helper function in wireproto.py to
obtain a prioritized list of compression engines available to the
wire protocol. While not utilized yet, we implement support
for obtaining the list of compression engines advertised by the
client.

The upcoming HTTP protocol enhancements are a bit lower-level than
existing tests (most existing tests are command centric). So,
this commit establishes a new test file that will be appropriate
for holding tests around the functionality of the HTTP protocol
itself.

Rounding out this change, `hg debuginstall` now prints compression
engines available to the server.
2016-12-24 15:21:46 -07:00
Gregory Szorc
73aa240fe1 util: declare wire protocol support of compression engines
This patch implements a new compression engine API allowing
compression engines to declare support for the wire protocol.

Support is declared by returning a compression format string
identifier that will be added to payloads to signal the compression
type of data that follows and default integer priorities of the
engine.

Accessor methods have been added to the compression engine manager
class to facilitate use.

Note that the "none" and "bz2" engines declare wire protocol support
but aren't enabled by default due to their priorities being 0. It
is essentially free from a coding perspective to support these
compression formats, so we do it in case anyone may derive use from
it.
2016-12-24 13:51:12 -07:00
Gregory Szorc
0d089ecdf9 internals: document compression negotiation
As part of adding zstd support to all of the things, we'll need
to teach the wire protocol to support non-zlib compression formats.

This commit documents how we'll implement that.

To understand how we arrived at this proposal, let's look at how
things are done today.

The wire protocol today doesn't have a unified format. Instead,
there is a limited facility for differentiating replies as successful
or not. And, each command essentially defines its own response format.

A significant deficiency in the current protocol is the lack of
payload framing over the SSH transport. In the HTTP transport,
chunked transfer is used and the end of an HTTP response body (and
the end of a Mercurial command response) can be identified by a 0
length chunk. This is how HTTP chunked transfer works. But in the
SSH transport, there is no such framing, at least for certain
responses (notably the response to "getbundle" requests). Clients
can't simply read until end of stream because the socket is
persistent and reused for multiple requests. Clients need to know
when they've encountered the end of a request but there is nothing
simple for them to key off of to detect this. So what happens is
the client must decode the payload (as opposed to being dumb and
forwarding frames/packets). This means the payload itself needs
to support identifying end of stream. In some cases (bundle2), it
also means the payload can encode "error" or "interrupt" events
telling the client to e.g. abort processing. The lack of framing
on the SSH transport and the transfer of its responsibilities to
e.g. bundle2 is a massive layering violation and a wart on the
protocol architecture. It needs to be fixed someday by inventing a
proper framing protocol.

So about compression.

The client transport abstractions have a "_callcompressable()"
API. This API is called to invoke a remote command that will
send a compressible response. The response is essentially a
"streaming" response (no framing data at the Mercurial layer)
that is fed into a decompressor.

On the HTTP transport, the decompressor is zlib and only zlib.
There is currently no mechanism for the client to specify an
alternate compression format. And, clients don't advertise what
compression formats they support or ask the server to send a
specific compression format. Instead, it is assumed that non-error
responses to "compressible" commands are zlib compressed.

On the SSH transport, there is no compression at the Mercurial
protocol layer. Instead, compression must be handled by SSH
itself (e.g. `ssh -C`) or within the payload data (e.g. bundle
compression).

For the HTTP transport, adding new compression formats is pretty
straightforward. Once you know what decompressor to use, you can
stream data into the decompressor until you reach a 0 size HTTP
chunk, at which point you are at end of stream.

So our wire protocol changes for the HTTP transport are pretty
straightforward: the client and server advertise what compression
formats they support and an appropriate compression format is
chosen. We introduce a new HTTP media type to hold compressed
payloads. The header of the payload defines the compression format
being used. Whoever is on the receiving end can sniff the first few
bytes route to an appropriate decompressor.

Support for multiple compression formats is advertised on both
server and client. The server advertises a "compression" capability
saying which compression formats it supports and in what order they
are preferred. Clients advertise their support for multiple
compression formats and media types via the introduced "X-HgProto"
request header.

Strictly speaking, servers don't need to advertise which compression
formats they support. But doing so allows clients to fail fast if
they don't support any of the formats the server does. This is useful
in situations like sending bundles, where the client may have to
perform expensive computation before sending data to the server.

Rather than simply advertise a list of supported compression formats,
we introduce an additional "httpmediatype" server capability
advertising which media types the server supports. This means servers
are explicit about what formats they exchange. IMO, this is superior
to inferring support from other capabilities (like "compression").

By advertising compression support on each request in the "X-HgProto"
header and media type and direction at the server level, we are able
to gradually transition existing commands/responses to the new media
type and possibly compression. Contrast with the old world, where we
only supported a single media type and the use of compression was
built-in to the semantics of the command on both client and server.
In the new world, if "application/mercurial-0.2" is supported,
compression is supported. It's that simple.

It's worth noting that we explicitly don't use "Accept,"
"Accept-Encoding," "Content-Encoding," or "Transfer-Encoding" for
content negotiation and compression. People knowledgeable of the HTTP
specifications will say that we should use these because that's
what they are designed to be used for. They have a point and I
sympathize with the argument. Earlier versions of this commit even
defined supported media types in the "Accept" header. However, my
years of experience rolling out services leveraging HTTP has taught
me to not trust the HTTP layer, especially if you are going outside
the normal spec (such as using a custom "Content-Encoding" value to
represent zstd streams). I've seen load balancers, proxies, and other
network devices do very bad and unexpected things to HTTP messages
(like insisting zlib compressed content is decoded and then re-encoded
at a different compression level or even stripping compression
completely). I've found that the best way to avoid surprises when
writing protocols on top of HTTP is to use HTTP as a dumb transport as
much as possible to minimize the chances that an "intelligent" agent
between endpoints will muck with your data. While the widespread use of
TLS is mitigating many intermediate network agents interfering with
HTTP, there are still problems at the edges, with e.g. the origin HTTP
server needing to convert HTTP to and from WSGI and buggy or
feature-lacking HTTP client implementations. I've found the best way to
avoid these problems is to avoid using headers like "Content-Encoding"
and to bake as much logic as possible into media types and HTTP message
bodies. The protocol changes in this commit do rely on a custom HTTP
request header and the "Content-Type" headers. But we used them before,
so we shouldn't be increasing our exposure to "bad" HTTP agents.

For the SSH transport, we can't easily implement content negotiation
to determine compression formats because the SSH transport has no
content negotiation capabilities today. And without a framing protocol,
we don't know how much data to feed into a decompressor. So in order
to implement compression support on the SSH transport, we'd need to
invent a mechanism to represent content types and an outer framing
protocol to stream data robustly. While I'm fully capable of doing
that, it is a lot of work and not something that should be undertaken
lightly. My opinion is that if we're going to change the SSH transport
protocol, we should take a long hard look at implementing a grand
unified protocol that attempts to address all the deficiencies with
the existing protocol. While I want this to happen, that would be
massive scope bloat standing in the way of zstd support. So, I've
decided to take the easy solution: the SSH transport will not gain
support for multiple compression formats. Keep in mind it doesn't
support *any* compression today. So essentially nothing is changing
on the SSH front.
2016-12-24 13:56:36 -07:00
Gregory Szorc
52ab84abd8 httppeer: extract code for HTTP header spanning
A second consumer of HTTP header spanning will soon be introduced.
Factor out the code to do this so it can be reused.
2016-12-24 14:46:02 -07:00
Gregory Szorc
45174d8965 commands: config option to control bundle compression level
Currently, bundle compression uses the default compression level
for the active compression engine. The default compression level
is tuned as a compromise between speed and size.

Some scenarios may call for a different compression level. For
example, with clone bundles, bundles are generated once and used
several times. Since the cost to generate is paid infrequently,
server operators may wish to trade extra CPU time for better
compression ratios.

This patch introduces an experimental and undocumented config
option to control the bundle compression level. As the inline
comment says, this approach is a bit hacky. I'd prefer for
the compression level to be encoded in the bundle spec. e.g.
"zstd-v2;complevel=15." However, given that the 4.1 freeze is
imminent, I'm not comfortable implementing this user-facing
change without much time to test and consider the implications.
So, we're going with the quick and dirty solution for now.

Having this option in the 4.1 release will enable Mozilla to
easily produce and test zlib and zstd bundles with non-default
compression levels in production. This will help drive future
development of the feature and zstd integration with Mercurial.
2017-01-10 11:20:32 -08:00
Gregory Szorc
9efea3d15c bundle2: allow compression options to be passed to compressor
Compression engines allow options to be passed to them to control
behavior. This patch exposes an argument to bundle2.writebundle()
that passes options to the compression engine when writing compressed
bundles. The argument is honored for both bundle1 and bundle2, the
latter requiring a bit of plumbing to pass the value around.
2017-01-10 11:19:37 -08:00
Valters Vingolds
0b2827adb4 rebase: provide detailed hint to abort message if working dir is not clean
Detailed hint message is now provided when 'pull --rebase' operation detects
unclean working dir, for example:
  abort: uncommitted changes
  (cannot pull with rebase: please commit or shelve your changes first)

Added tests for uncommitted merge, and for subrepo support verifying that same
hint is also passed to subrepo state check.
2017-01-10 09:32:27 +01:00
Yuya Nishihara
d04abe7517 revset: parse variable-length arguments of followlines() by getargsdict() 2017-01-09 16:02:56 +09:00
Yuya Nishihara
b1575d5948 parser: extend buildargsdict() to support variable-length positional args
This can simplify the argument parsing of followlines(). Tests are added by
the next patch.
2017-01-09 15:25:52 +09:00
Yuya Nishihara
3d36c04638 parser: make buildargsdict() precompute position where keyword args start
This prepares for adding *varargs support. See the next patch.
2017-01-09 15:15:21 +09:00
Jun Wu
f7a8f527b8 chgserver: add the setprocname interface
This allows clients to change its process title freely.
2017-01-11 07:36:48 +08:00
Anton Shestakov
836493ef5e hgweb: use archivespecs for links on repo index page too
Moving archivespecs to the module level allows using it from other modules
(such as hgwebdir_mod), and keeping a reference to it in requestcontext allows
current code to just work.
2017-01-10 23:41:58 +08:00
Anton Shestakov
22ec8d9ae5 hgweb: use util.sortdict for archivespecs
Thus we allow dict-like indexing and "in" checks, and also preserve the order
of archive types and can generate links in a certain order (so
requestcontext.archives is no longer needed).
2017-01-10 23:34:39 +08:00
Remi Chaintron
66071d6de5 revlog: REVIDX_EXTSTORED flag
This flag will be used by the lfs extension to mark the revision data as stored
externally.
2017-01-05 17:16:51 +00:00
Remi Chaintron
dfc79cbfc3 revlog: flag processor
Add the ability for revlog objects to process revision flags and apply
registered transforms on read/write operations.

This patch introduces:
- the 'revlog._processflags()' method that looks at revision flags and applies
  flag processors registered on them. Due to the need to handle non-commutative
  operations, flag transforms are applied in stable order but the order in which
  the transforms are applied is reversed between read and write operations.
- the 'addflagprocessor()' method allowing to register processors on flags.
  Flag processors are defined as a 3-tuple of (read, write, raw) functions to be
  applied depending on the operation being performed.
- an update on 'revlog.addrevision()' behavior. The current flagprocessor design
  relies on extensions to wrap around 'addrevision()' to set flags on revision
  data, and on the flagprocessor to perform the actual transformation of its
  contents. In the lfs case, this means we need to process flags before we meet
  the 2GB size check, leading to performing some operations before it happens:
  - if flags are set on the revision data, we assume some extensions might be
    modifying the contents using the flag processor next, and we compute the
    node for the original revision data (still allowing extension to override
    the node by wrapping around 'addrevision()').
  - we then invoke the flag processor to apply registered transforms (in lfs's
    case, drastically reducing the size of large blobs).
  - finally, we proceed with the 2GB size check.

Note: In the case a cachedelta is passed to 'addrevision()' and we detect the
flag processor modified the revision data, we chose to trust the flag processor
and drop the cachedelta.
2017-01-10 16:15:21 +00:00
Remi Chaintron
bd07cff7ec revlog: pass revlog flags to addrevision
Adding the ability to passing flags to addrevision instead of simply passing
default flags to _addrevision will allow extensions relying on flag transforms
to wrap around addrevision() in order to update revlog flags.

The first use case of this patch will be the lfs extension marking nodes as
stored externally when the contents are larger than the defined threshold.

One of the reasons leading to setting flags in addrevision() wrappers in the
flag processor design is that it allows to detect files larger than the 2GB
limit before the check is performed, which allows lfs to transform the contents
into metadata.
2017-01-05 17:16:07 +00:00
Remi Chaintron
6d11b9177b revlog: add 'raw' argument to revision and _addrevision
This patch introduces a new 'raw' argument (defaults to False) to revlog's
revision() and _addrevision() methods.
When the 'raw' argument is set to True, it indicates the revision data should be
handled as raw data by the flagprocessor.

Note: Given revlog.addgroup() calls are restricted to changegroup generation, we
can always set raw to True when calling revlog._addrevision() from
revlog.addgroup().
2017-01-05 17:16:07 +00:00
Jun Wu
b61b02a865 chg: remove getpager support
We have enough bits to switch to the new chg pager code path in runcommand.
So just remove the legacy getpager support.

This is a red-only patch, and will break chg's pager support temporarily.
2017-01-10 06:59:39 +08:00
Jun Wu
2fc8d9fe86 chgserver: implement chgui._runpager
This patch implements chgui._runpager in a relatively simple way. A more
clean way is to move the core logic of "attachio" to "ui", which will be
done later after chg runs uisetup per request.
2017-01-10 06:59:31 +08:00
Jun Wu
ed9bebc440 chgserver: make S channel support pager request
This patch adds the "pager" support for the S channel. The pager API allows
running some subcommands, namely attachio, and waiting for the client to be
properly synchronized.
2017-01-10 06:59:21 +08:00
Jun Wu
7085592213 chgserver: use util.shellenviron
This avoids code duplication.
2017-01-10 06:58:51 +08:00
Jun Wu
c50e85b0d3 util: extract the logic calculating environment variables
The method will be reused in chgserver. Move it out so it can be reused.
2017-01-10 06:58:02 +08:00
Anton Shestakov
8d71b91ef9 hgweb: generate archive links in order
It would be nice for archive links to always be in a certain commonly used
order, such as 'zip', 'bz', 'gzip2'. Repo index page (hgwebdir_mod) already
shows archive links in this order, let's do the same in hgweb_mod.

Sadly, archivespecs is a regular unordered dict, and collections.OrderedDict is
new in 2.7. But requestcontext.archives is a tuple of archive types, so it can
be used as an index to archivespecs.
2017-01-08 00:52:54 +08:00
Anton Shestakov
5dfa3509d4 hgweb: use archivespecs (dict) instead of archives (tuple) for "in" check 2017-01-08 01:24:45 +08:00
Matt Harbison
1e958d800e help: merge the various operator sections of revsets, filesets and templates
Having sections for specific operator types assumes the user already knows what
type of operators are supported.  By having a common heading, the user can
simply lookup help for "(revsets|filesets|templates).operators".
2017-01-08 12:05:10 -05:00
Matt Harbison
28a570f58c help: apply the section headings from revsets to templates
Unlike filesets, there are a few distinct headings that are not shared with
revsets.  But common names are used where possible.
2017-01-08 02:43:01 -05:00
Matt Harbison
0b96ef6f6b help: apply the section headings from revsets to filesets
This has the nice property of visually breaking up the wall of text.  It also
allows specific smaller sections to be called out.  For example,
`hg help filesets.predicates` now prints just the predicate section.  At the
moment, the revset headings are a superset of the fileset headings, so there is
consistency in how example, predicate and operator help is called out.

The reference to `hg help patterns` was moved to the overview section, so that
it isn't stuck in the examples section.
2017-01-08 02:40:36 -05:00
Jun Wu
56484854f9 chgserver: check type passed to S channel
It currently only supports the "system" type. Add an explicit check.
2017-01-06 16:12:25 +00:00
Jun Wu
734e02b02d chg: send type information via S channel (BC)
Previously S channel is only used to send system commands. It will also be
used to send pager commands. So add a type parameter.

This breaks older chg clients. But chg and hg should always come from a
single commit and be packed into a single package. Supporting running
inconsistent versions of chg and hg seems to be unnecessarily complicated
with little benefit. So just make the change and assume people won't use
inconsistent chg with hg.
2017-01-06 16:11:03 +00:00
Yuya Nishihara
630684236e commit: fix unmodified message detection for the "--- >8 ----" magic
We need the raw editortext to be compared with the templatetext.
2017-01-06 22:50:04 +09:00
Denis Laxalde
6dab59dff8 summary: use ui.label and join to write evolution troubles
Follow-up on da7b2bf5ad52 to avoid a convoluted loop.
2017-01-07 12:24:15 +01:00
Denis Laxalde
ea885ed1d6 log: drop unnecessary ui.note label from "trouble: " line
Follow-up on 38b8a4a2230c and 3f2425cfd46f.
2017-01-07 12:07:56 +01:00
Denis Laxalde
20d1dad252 revset: add a followlines(file, fromline, toline[, rev]) revset
This revset returns the history of a range of lines (fromline, toline) of a
file starting from `rev` or the current working directory.

Added tests in test-annotate.t which already contains a reasonably complex
repository.
2017-01-04 16:47:49 +01:00
Denis Laxalde
7092fa95d8 context: add a blockancestors(fctx, fromline, toline) function
This yields ancestors of `fctx` by only keeping changesets touching the file
within specified linerange = (fromline, toline).

Matching revisions are found by inspecting the result of `mdiff.allblocks()`,
filtered by `mdiff.blocksinrange()`, to find out if there are blocks of type
"!" within specified line range.

If, at some iteration, an ancestor with an empty line range is encountered,
the algorithm stops as it means that the considered block of lines actually
has been introduced in the revision of this iteration. Otherwise, we finally
yield the initial revision of the file as the block originates from it.

When a merge changeset is encountered during ancestors lookup, we consider
there's a diff in the current line range as long as there is a diff between
the merge changeset and at least one of its parents (in the current line
range).
2016-12-28 23:03:37 +01:00
Denis Laxalde
dc8e8fcbf9 mdiff: add a "blocksinrange" function to filter diff blocks by line range
The function filters diff blocks as generated by mdiff.allblock function based
on whether they are contained in a given line range based on the "b-side" of
blocks.
2017-01-03 18:15:58 +01:00
Denis Laxalde
5e3ca8d1ab summary: add evolution "troubles" information to summary output
Extend the "parent: " lines in summary with the list of evolution "troubles"
in parentheses, when the parent is troubled.
2017-01-06 14:35:22 +01:00
Denis Laxalde
8ebbb5679b summary: use the same labels as log command in "parent: " line
Re-use the cmdutil._changesetlabels function introduced in c400c86d547f to
have consistent labels between the "changeset: " line in log command and the
"parent: " line in summary.
2017-01-06 14:34:34 +01:00
Denis Laxalde
b2aed04403 templates: display evolution "troubles" in command line style 2017-01-06 13:50:52 +01:00
Denis Laxalde
0c89f1cb3e templatekw: add a "troubles" template keyword
The "troubles" template keyword returns a list of evolution troubles.
It is EXPERIMENTAL, as anything else related to changeset evolution.

Test it in test-obsolete.t which has troubled changesets.
2017-01-06 13:50:16 +01:00
Denis Laxalde
627f47db5c cmdutil: add missing "i18n" comment about "trouble: " line
Follow-up on 38b8a4a2230c per late review.
2017-01-06 12:36:21 +01:00
Gregory Szorc
05ec82c913 hgweb: link to raw-file on annotation page (BC)
Every other template has the "raw" link load "raw-file." However,
fileannotate.tmpl's "raw" link loads "raw-annotate." This feels
inconsistent and wrong.

As far as I can tell, linking to the "raw annotate" view has occurred
since 2006.
2016-12-28 15:48:17 -07:00
Martin von Zweigbergk
e1f0ba8ef9 repair: combine two loops over changelog revisions
This just saves a few lines.
2017-01-04 10:35:04 -08:00
Martin von Zweigbergk
92d0334538 repair: speed up stripping of many roots
repair.strip() expects a set of root revisions to strip. It then
builds the full set of descedants by walking the descandants of
each. It is rare that more than a few roots get passed in, but if that
happens, it will wastefully walk the changelog for each root. So let's
just walk it once.

I noticed this because the narrowhg extension was passing not only
roots, but all the commits to strip. When there were tens of thousands
of commits to strip, this resulted in quadratic behavior with that
extension.
2017-01-04 10:07:12 -08:00
Sean Farley
7f456ac7c6 config: add docs for ignoring all text below in the editor
This is an example of how to use the new skip-from-there string for ignoring the
diff in a commit message.
2017-01-04 22:32:42 -06:00
Sean Farley
52b92c45af cmdutil: add special string that ignores rest of text
Similar to git, we add a special string:

  HG: ------------------------ >8 ------------------------

that means anything below it is ignored in a commit message.

This is helpful for integrating with third-party tools that display the
2016-12-31 15:36:36 -06:00
Yuya Nishihara
a7a60a2e43 revset: drop TODO comment about sorting issue of fullreposet
The bootstrapping issue was addressed at the parsing phase and we expect
that fullreposet.__and__() fully complies to the smartset API, in which
'self & other' should return a result set in self's order. See also
ab938e7ae803.
2016-05-14 20:52:44 +09:00
Yuya Nishihara
2fa6a1e65e revset: document wdir() as an experimental function
Let's resurrect the docstring since our help module can detect the EXPERIMENTAL
tag and display it only if -v is specified.

This patch updates the test added by bbdfa2d5aaa2 since wdir() is now
documented.
2017-01-05 22:53:42 +09:00
Yuya Nishihara
ec99971228 revset: categorize wdir() as very fast function
The cost of wdir() should be identical to or cheaper than _intlist().
2016-08-20 17:50:23 +09:00
Yuya Nishihara
caee220313 templater: provide loop counter as "index" keyword
This was originally written for JSON templating where we would have to be
careful to not add extra comma, but seems generally useful.

Inner loop started by % operator has its own counter.
2016-04-22 21:46:33 +09:00
Yuya Nishihara
717a34e2df templater: rename variable "i" to "v" in runmap()
I want to reuse "i" for index.
2016-04-22 21:45:06 +09:00
Yuya Nishihara
d6502e3a12 formatter: reorder code that builds template mapping
This makes the future patch slightly simpler.
2017-04-02 22:43:18 +09:00
Jun Wu
e557e14680 revlog: avoid applying delta chain on cache hit
Previously, revlog.revision(raw=False) may try to apply the delta chain
on _cache hit. That happens if flags are non-empty. This patch makes rawtext
reused so delta chain application is avoided.

"_cache" and "rev" are moved a bit to avoid unnecessary assignments.
2017-04-02 18:40:13 -07:00
Jun Wu
5f26616d71 revlog: indent block to make review easier 2017-04-02 18:29:24 -07:00
Jun Wu
2ab18ee566 revlog: avoid calculating "flags" twice in revision()
This is more consistent with other code in "revision()" - prefer performance
to code length.
2017-04-02 18:25:12 -07:00
Jun Wu
20165e0767 revlog: use raw revision for rawsize
When writing the revlog-ng index, the third field is len(rawtext). See
revlog._addrevision:

    textlen = len(rawtext)
    ....
    e = (offset_type(offset, flags), l, textlen,
         base, link, p1r, p2r, node)
    self.index.insert(-1, e)

Therefore, revlog.index[rev][2] returned by revlog.rawsize should be
len(rawtext), where "rawtext" is revlog.revision(raw=True).

Unfortunately it's hard to add a test for this code path because "if l >= 0"
catches most cases.
2017-04-02 18:57:03 -07:00
Yuya Nishihara
d4c8257977 revsetlang: enable optimization of 'x + y' expression
It's been disabled since fa623f8a8cdd, but it can be enabled now as the
ordering requirement is resolved at analyze().
2016-05-14 20:51:57 +09:00
Gregory Szorc
04c3125727 commands: update help for "unbundle"
Similar to the recent change to "bundle," this command no longer
just deals with "changegroup" data.
2017-04-01 13:43:52 -07:00
Gregory Szorc
5c890f5f16 commands: update help for "bundle"
We now have a dedicated help topic to describe bundle specification
strings. Let's update `hg bundle`'s documentation to reflect its
existence.

While I was hear, I also tweaked some wording which I felt was out
of date and needed tweaking. Specifically, `hg bundle` no longer
just deals with "changegroup" data: it can also generate files
that have non-changegroup data.
2017-04-01 13:43:43 -07:00
Gregory Szorc
2f6dff3311 help: document bundle specifications
I softly formalized the concept of a "bundle specification" a while
ago when I was working on clone bundles and stream clone bundles and
wanted a more robust way to define what exactly is in a bundle file.

The concept has existed for a while. Since it is part of the clone
bundles feature and exposed to the user via the "-t" argument to
`hg bundle`, it is something we need to support for the long haul.

After the 4.1 release, I heard a few people comment that they didn't
realize you could generate zstd bundles with `hg bundle`. I'm
partially to blame for not documenting it in bundle's docstring.

Additionally, I added a hacky, experimental feature for controlling
the compression level of bundles in 054e64c4d837. As the commit
message says, I went with a quick and dirty solution out of time
constraints. Furthermore, I wanted to eventually store this
configuration in the "bundlespec" so it could be made more flexible.

Given:

a) bundlespecs are here to stay
b) we don't have great documentation over what they are, despite being
   a user-facing feature
c) the list of available compression engines and their behavior isn't
   exposed
d) we need an extensible place to modify behavior of compression
   engines

I want to move forward with formalizing bundlespecs as a user-facing
feature. This commit does that by introducing a "bundlespec" help
page. Leaning on the just-added compression engine documentation
and API, the topic also conveniently lists available compression
engines and details about them. This makes features like zstd
bundle compression more discoverable. e.g. you can now
`hg help -k zstd` and it lists the "bundlespec" topic.
2017-04-01 13:42:06 -07:00
Gregory Szorc
252949c70b util: document bundle compression
An upcoming patch will add support for documenting bundle
specifications in more detail. As part of this, we'd like to
enumerate available bundle compression formats. In order to do
this, we need to provide the help mechanism a dict of names
and objects with docstrings.

This patch adds docstrings to compengine.bundletype and adds
a function for retrieving a dict of them. The code is not yet
used.
2017-04-01 13:29:01 -07:00
Gregory Szorc
64e2de02bd hgweb: extract path traversal checking into standalone function
A common exploit in web applications that access paths is to insert
path separator strings like ".." to try to get the server to serve up
files it shouldn't.

We have code for detecting this in staticfile(). A subsequent commit
will need to perform this test as well. Since this is security code,
let's factor the check so we don't have to reinvent the wheel.
2017-03-31 21:47:26 -07:00
Gregory Szorc
bfa11ec1e0 hgweb: use context manager for file I/O 2017-03-31 22:30:38 -07:00
Martin von Zweigbergk
7309b07c23 tags: rename "head" to "node" where we don't care
Followup to 8802e3f8cde1 (tags: extract fnode retrieval into its own
function, 2017-03-28) in which the "for head in head" became "for head
in nodes".
2017-04-03 10:01:38 -07:00
Martin von Zweigbergk
5c6cd2e435 manifest: update comment to be about bytearray
Looks like a leftover from 54d8e724da64 (py3: use bytearray() instead
of array('c', ...) constructions, 2017-03-12).
2017-04-03 08:45:24 -07:00
Denis Laxalde
b2e35d013c hgweb: rename linerangelog.js as followlines.js
So that the file name matches both the feature name and user facing vocabulary
(e.g. the revset function).
2017-04-03 10:02:55 +02:00
Denis Laxalde
ebca8029e1 hgweb: rely on a specific class to change cursor type in followlines UI
The previous CSS rule would also apply in pages where followlines UI was not
available (e.g. "changeset" view at /rev/<node>/). We insert a
"followlines-select" class in JavaScript on actually selectable lines and
restrict the CSS selector to use it.
2017-04-03 09:58:36 +02:00
Denis Laxalde
135e6c8920 hgweb: use a function expression for the install listener of followlines UI
We define the listener of document's "DOMContentLoaded" inline in registration
and use a function expression (anonymous) with everything inside. This makes
it clearer that this file is not a library of JavaScript functions but rather
an executable script.

(Most of changes consists of reindenting the "followlinesBox" function, so
mostly white space changes.)
2017-04-03 09:40:25 +02:00
Yuya Nishihara
ce44b9ffb0 formatter: use templatefilters.json()
Now _jsonifyobj() is identical to templatefilters.json(paranoid=False).
2017-04-02 12:02:17 +09:00
Yuya Nishihara
ccd02d13b3 templatefilters: use list comprehension in json()
Not important, but the code slightly looks better.
2017-04-02 11:54:24 +09:00
Yuya Nishihara
67ba45c194 templatefilters: unroll handling of None/False/True
It doesn't make sense to use a dict here.
2017-04-02 11:51:25 +09:00
Yuya Nishihara
c5537c8613 templatefilters: drop callable support from json()
This backs out 1e5c61c691c5. A callable should be evaluated beforehand
by templater.runsymbol().
2017-04-02 11:46:49 +09:00
Yuya Nishihara
33d687d3d1 ui: use bytes IO and convert EOL manually in ui.editor()
Text IO sucks on Python 3 as it must be a unicode stream. We could introduce
a wrapper that converts unicode back to bytes, but it wouldn't be simple to
handle offsets transparently from/to underlying IOBase API.

Fortunately, we don't need to process huge text files, so let's stick to
bytes IO and convert EOL in memory.
2017-03-29 21:43:38 +09:00
Yuya Nishihara
40564e2c98 util: add helper to convert between LF and native EOL
See the next patch for why.
2017-03-29 21:40:15 +09:00
Yuya Nishihara
1163409660 util: extract pure tolf/tocrlf() functions from eol extension
This can be used for EOL conversion of text files.
2017-03-29 21:28:54 +09:00
Yuya Nishihara
2f682a6206 pycompat: provide bytes os.linesep 2017-03-29 21:23:28 +09:00
Yuya Nishihara
236d81fcc8 pycompat: introduce identity function as a compat stub
I was sometimes too lazy to use 'str' instead of 'lambda a: a'. Let's add
a named function for that purpose.
2017-03-29 21:13:55 +09:00
Gregory Szorc
62d4252847 show: new extension for displaying various repository data
Currently, Mercurial has a number of commands to show information. And,
there are features coming down the pipe that will introduce more
commands for showing information.

Currently, when introducing a new class of data or a view that we
wish to expose to the user, the strategy is to introduce a new command
or overload an existing command, sometimes both. For example, there is
a desire to formalize the wip/smartlog/underway/mine functionality that
many have devised. There is also a desire to introduce a "topics"
concept. Others would like views of "the current stack." In the
current model, we'd need a new command for wip/smartlog/etc (that
behaves a lot like a pre-defined alias of `hg log`). For topics,
we'd likely overload `hg topic[s]` to both display and manipulate
topics.

Adding new commands for every pre-defined query doesn't scale well
and pollutes `hg help`. Overloading commands to perform read-only and
write operations is arguably an UX anti-pattern: while having all
functionality for a given concept in one command is nice, having a
single command doing multiple discrete operations is not. Furthermore,
a user may be surprised that a command they thought was read-only
actually changes something.

We discussed this at the Mercurial 4.0 Sprint in Paris and decided that
having a single command where we could hang pre-defined views of
various data would be a good idea. Having such a command would:

* Help prevent an explosion of new query-related commands
* Create a clear separation between read and write operations
  (mitigates footguns)
* Avoids overloading the meaning of commands that manipulate data
  (bookmark, tag, branch, etc) (while we can't take away the
  existing behavior for BC reasons, we now won't introduce this
  behavior on new commands)
* Allows users to discover informational views more easily by
  aggregating them in a single location
* Lowers the barrier to creating the new views (since the barrier
  to creating a top-level command is relatively high)

So, this commit introduces the `hg show` command via the "show"
extension. This command accepts a positional argument of the
"view" to show. New views can be registered with a decorator. To
prove it works, we implement the "bookmarks" view, which shows a
table of bookmarks and their associated nodes.

We introduce a new style to hold everything used by `hg show`.

For our initial bookmarks view, the output varies from `hg bookmarks`:

* Padding is performed in the template itself as opposed to Python
* Revision integers are not shown
* shortest() is used to display a 5 character node by default (as
  opposed to static 12 characters)

I chose to implement the "bookmarks" view first because it is simple
and shouldn't invite too much bikeshedding that detracts from the
evaluation of `hg show` itself. But there is an important point
to consider: we now have 2 ways to show a list of bookmarks. I'm not
a fan of introducing multiple ways to do very similar things. So it
might be worth discussing how we wish to tackle this issue for
bookmarks, tags, branches, MQ series, etc.

I also made the choice of explicitly declaring the default show
template not part of the standard BC guarantees. History has shown
that we make mistakes and poor choices with output formatting but
can't fix these mistakes later because random tools are parsing
output and we don't want to break these tools. Optimizing for human
consumption is one of my goals for `hg show`. So, by not covering
the formatting as part of BC, the barrier to future change is much
lower and humans benefit.

There are some improvements that can be made to formatting. For
example, we don't yet use label() in the templates. We obviously
want this for color. But I'm not sure if we should reuse the existing
log.* labels or invent new ones. I figure we can punt that to a
follow-up.

At the aforementioned Sprint, we discussed and discarded various
alternatives to `hg show`.

We considered making `hg log <view>` perform this behavior. The main
reason we can't do this is because a positional argument to `hg log`
can be a file path and if there is a conflict between a path name and
a view name, behavior is ambiguous. We could have introduced
`hg log --view` or similar, but we felt that required too much typing
(we don't want to require a command flag to show a view) and wasn't
very discoverable. Furthermore, `hg log` is optimized for showing
changelog data and there are things that `hg display` could display
that aren't changelog centric.

There were concerns about using "show" as the command name.

Some users already have a "show" alias that is similar to `hg export`.

There were also concerns that Git users adapted to `git show` would
be confused by `hg show`'s different behavior. The main difference
here is `git show` prints an `hg export` like view of the current
commit by default and `hg show` requires an argument. `git show`
can also display any Git object. `git show` does not support
displaying more complex views: just single objects. If we
implemented `hg show <hash>` or `hg show <identifier>`, `hg show`
would be a superset of `git show`. Although, I'm hesitant to do that
at this time because I view `hg show` as a higher-level querying
command and there are namespace collisions between valid identifiers
and registered views.

There is also a prefix collision with `hg showconfig`, which is an
alias of `hg config`.

We also considered `hg view`, but that is already used by the "hgk"
extension.

`hg display` was also proposed at one point. It has a prefix collision
with `hg diff`. General consensus was "show" or "view" are the best
verbs. And since "view" was taken, "show" was chosen.

There are a number of inline TODOs in this patch. Some of these
represent decisions yet to be made. Others represent features
requiring non-trivial complexity. Rather than bloat the patch or
invite additional bikeshedding, I figured I'd document future
enhancements via TODO so we can get a minimal implmentation landed.
Something is better than nothing.
2017-03-24 19:19:00 -07:00
Jun Wu
7d2129ac1d verify: fix length check
According to the document added above, we should check L1 == L2, and the
only way to get L1 in all cases is to call "rawsize()", and the only way to
get L2 is to call "revision(raw=True)". Therefore the fix.

Meanwhile there are still a lot of things about flagprocessor broken in
revlog.py. Tests will be added after revlog.py gets fixed.
2017-03-29 14:49:14 -07:00
Jun Wu
5de9c76dc1 verify: document corner cases
It seems a good idea to list all kinds of "surprises" and expected behavior
to make the upcoming changes easier to understand.

Note: the comment added does not reflect the actual behavior of the current
code.
2017-03-29 14:45:01 -07:00
Denis Laxalde
c2ed8e445d hgweb: expose a followlines UI in filerevision view
In filerevision view (/file/<rev>/<fname>) we add some event listeners on
mouse clicks of <span> elements in the <pre class="sourcelines"> block.
Those listeners will capture a range of lines selected between two mouse
clicks and a box inviting to follow the history of selected lines will then
show up. Selected lines (i.e. the block of lines) get a CSS class which make
them highlighted. Selection can be cancelled (and restarted) by either
clicking on the cancel ("x") button in the invite box or clicking on any other
source line. Also clicking twice on the same line will abort the selection and
reset event listeners to restart the process.

As a first step, this action is only advertised by the "cursor: cell" CSS rule
on source lines elements as any other mechanisms would make the code
significantly more complicated. This might be improved later.

All JavaScript code lives in a new "linerangelog.js" file, sourced in
filerevision template (only in "paper" style for now).
2017-03-29 22:26:16 +02:00
Jun Wu
7151069c4a revlog: add a fast path for revision(raw=False)
If cache hit and flags are empty, no flag processor runs and "text" equals
to "rawtext". So we check flags, and return rawtext.

This resolves performance issue introduced by a previous patch.
2017-03-30 21:21:15 -07:00
Jun Wu
ae8c9ce375 revlog: make _addrevision only accept rawtext
All 3 users of _addrevision use raw:

  - addrevision: passing rawtext to _addrevision
  - addgroup: passing rawtext and raw=True to _addrevision
  - clone: passing rawtext to _addrevision

There is no real user using _addrevision(raw=False). On the other hand,
_addrevision is low-level code dealing with raw revlog deltas and rawtexts.
It should not transform rawtext to non-raw text.

This patch removes the "raw" parameter from "_addrevision", and does some
rename and doc change to make it clear that "_addrevision" expects rawtext.

Archeology shows 886a08012bbe added "raw" flag to "_addrevision", follow-ups
fe1e206cb389 and 1cfa6239c923 seem to make the flag unnecessary.

test-revlog-raw.py no longer complains.
2017-03-30 18:38:03 -07:00
Jun Wu
9a6035a980 revlog: use raw revisions in clone
test-revlog-raw.py now shows "clone test passed", but there is more to fix.
2017-03-30 18:24:23 -07:00
Jun Wu
2468c838bd revlog: use raw revisions in revdiff
See the added comment. revdiff is meant to output the raw delta that will be
written to revlog. It should use raw.

test-revlog-raw.py now shows "addgroupcopy test passed", but there is more
to fix.
2017-03-30 18:23:27 -07:00
Jun Wu
558f5cce61 revlog: use raw content when building delta
Using external content provided by flagprocessor when building revlog delta
is wrong, because deltas are applied to raw contents in revlog.

This patch fixes the above issue by adding "raw=True".

test-revlog-raw.py now shows "local test passed", but there is more to fix.
2017-03-30 17:58:03 -07:00
Jun Wu
50b232c61f revlog: fix _cache usage in revision()
As documented at revlog.__init__, revlog._cache stores raw text.

The current read and write usage of "_cache" in revlog.revision lacks of
raw=True check.

This patch fixes that by adding check about raw, and storing rawtext
explicitly in _cache.

Note: it may slow down cache hit code path when raw=False and flags=0. That
performance issue will be fixed in a later patch.

test-revlog-raw now points us to a new problem.
2017-03-30 15:34:08 -07:00
Jun Wu
fec2bbc9e9 revlog: rename some "text"s to "rawtext"
This makes code easier to understand. "_addrevision" is left untouched - it
will be changed in a later patch.
2017-03-30 14:56:09 -07:00
Jun Wu
b483b1b607 revlog: clarify flagprocessor documentation
The words "text", "newtext", "bool" could be confusing. Use explicit "text"
or "rawtext" and document more about the "bool".
2017-03-30 07:59:48 -07:00
Pierre-Yves David
e3a8075346 hook: add hook name information to external hook
While we are here, we can also add the hook name information to external hook.
2017-03-31 11:53:56 +02:00
Pierre-Yves David
833b1335ba hook: provide hook type information to external hook
The python hooks have access to the hook type information. There is not reason
for external hook to not be aware of it too.

For the record my use case is to make sure a hook script is configured for the
right type.
2017-03-31 11:08:11 +02:00
Pierre-Yves David
e6f763fb31 hook: use 'htype' in 'hook'
Same rational as for 'runhooks', we fix the naming in another function.
2017-03-31 11:06:42 +02:00
Pierre-Yves David
9eacfb523e hook: use 'htype' in 'runhooks'
Same rational as for '_pythonhook', 'htype' is more accurate and less error
prone. We just fixed an error from the 'name'/'hname' confusion and this should
prevent them in the future.
2017-03-31 11:03:23 +02:00
Pierre-Yves David
be64e99e37 hook: fix name used in untrusted message
The name used in the message we issue when a hook is untrusted was using "name"
which is actually the hook type and not the name of the hook.
2017-03-31 11:02:05 +02:00
Pierre-Yves David
80012baa72 hook: use "htype" as variable name in _pythonhook
We rename 'name' to 'htype' because it fits the variable content better.
Multiple python hooks already use 'htype' as a name for the argument. This makes
the difference with "hname" clearer and the code less error prone.
2017-03-31 10:59:37 +02:00
Matt Harbison
5bca44776e templatefilter: add support for 'long' to json()
When disabling the '#requires serve' check in test-hgwebdir.t and running it on
Windows, several 500 errors popped up when querying '?style=json', with the
following in the error log:

    File "...\\mercurial\\templater.py", line 393, in runfilter
      "keyword '%s'") % (filt.func_name, dt))
    Abort: template filter 'json' is not compatible with keyword 'lastchange'

The swallowed exception at that point was:

    File "...\\mercurial\\templatefilters.py", line 242, in json
      raise TypeError('cannot encode type %s' % obj.__class__.__name__)
    TypeError: cannot encode type long

This corresponds to 'lastchange' being populated by hgweb.common.get_stat(),
which uses os.stat().st_mtime.  os.stat_float_times() is being disabled in util,
so the type for the times is 'long' on Windows, and 'int' on Linux.
2017-04-01 00:21:17 -04:00
Denis Laxalde
495eefece3 hgweb: prefix line id by ctx shortnode in filelog when patches are shown
When "patch" query parameter is present in requests to filelog view, line ids
in patches diff are no longer unique in the page since several patches are
shown on the same page. We now prefix line id by changeset shortnode when
several patches are displayed in the same page to have unique line ids
overall.
2017-03-30 21:40:10 +02:00
Matt Harbison
8b52949bfa sslutil: clarify internal documentation
I ran into this python issue with an incomplete certificate chain on Windows
recently, and this is the clarification that came from that experimenting.  The
comment I left on the bug tracker [1] with a reference to the CPython code [2]
indicates that the original problem I had is a different bug, but happened to
be mentioned under issue20916 on the Python bug tracker.

[1] https://bz.mercurial-scm.org/show_bug.cgi?id=5313#c7
[2] https://hg.python.org/cpython/file/v2.7.12/Modules/_ssl.c#l628
2017-03-29 09:54:34 -04:00
Jun Wu
6ac1cb4157 unionrepo: avoid unnecessary node -> rev conversion 2017-03-29 16:28:51 -07:00
Jun Wu
9c45c07c8b bundlerepo: avoid unnecessary node -> rev conversion 2017-03-29 16:28:00 -07:00
Jun Wu
7f99b86dbd revlog: avoid unnecessary node -> rev conversion 2017-03-29 16:23:04 -07:00
Jun Wu
47e81ec208 hardlink: check directory's st_dev when copying files
Previously, when copying a file, copyfiles will compare src's st_dev with
dirname(dst)'s st_dev, to decide whether to enable hardlink or not.

That could have issues on Linux's overlayfs, where stating directories could
result in different st_dev from st_dev of stating files, even if both the
directories and the files exist in the overlay's upperdir.

This patch fixes it by checking dirname(src) instead. It's more consistent
because we are checking directories for both src and dest.

That fixes test-hardlinks.t running on common Docker setups.
2017-03-29 12:37:03 -07:00
Jun Wu
7621302bee hardlink: duplicate hardlink detection for copying files and directories
A later patch will change one of them so they diverge.
2017-03-29 12:26:46 -07:00
Jun Wu
29358be572 hardlink: extract topic text logic of copyfiles
The topic text shows whether it's "linking" or "copying", based on
"hardlink" value. The function is extracted so a later patch can reuse it.
2017-03-29 12:21:15 -07:00
Pulkit Goyal
46eedbfb56 color: replace str() with pycompat.bytestr() 2017-03-29 14:47:52 +05:30
Pulkit Goyal
87883687d7 diff: slice over bytes to make sure conditions work normally
Both of this are part of generating `hg diff` on python 3.
2017-03-26 20:52:51 +05:30
Gregory Szorc
af6c4ae0b9 minirst: remove redundant _admonitions set
As Yuya pointed out during a review a month ago, _admonitions and
_admonitiontitles are largely redundant. With the last commit, they
are exactly redundant. So, remove _admonitions and use
_admonitiontitles.keys() instead.
2017-03-29 20:19:26 -07:00
Gregory Szorc
ceec73bcc5 minirst: remove "admonition" from _admonitions
The "admonition" rst primitive is split into "specific" admonitions
("attention," "caution," etc) and the "generic" admonition
("admonition"). For more, see
http://docutils.sourceforge.net/docs/ref/rst/directives.html#admonitions

The _admonitions set and keys of the _admonitiontitles dict
overlap exactly except _admonitions has an "admonition" entry.
Nowhere in Mercurial is the "admonition" admonition directive used.
Even if it were, it doesn't have a title, so it wouldn't be rendered
correctly.

So, let's remove "admonition" from the set of recognized admonition
directives.
2017-03-29 20:05:18 -07:00
Gregory Szorc
67bccc1d16 minirst: reindent _admonitiontitles
I don't like the verical indent.

While I was here, I cleaned up some whitespace and added a trailing
comma on the last element.
2017-03-29 19:59:47 -07:00
Pierre-Yves David
3663d84e7b tags: extract filenode filtering into its own function
We'll also need to reuse this logic so we extract it into its own function. We
document some of the logic in the process.
2017-03-28 06:23:28 +02:00
Pierre-Yves David
f48c2e3542 tags: extract tags computation from fnodes into its own function
I'm about to introduce code that needs to perform such computation on
"arbitrary" nodes. The logic is extracted into its own function for reuse.
2017-03-28 06:08:12 +02:00
Pierre-Yves David
ef96a2bca6 tags: only return 'alltags' in 'findglobaltags'
This is minor update along the way. We simplify the 'findglobaltags' function to
only return the tags. Since no existing data is reused, we know that all tags
returned are global and we can let the caller get that information if it cares
about it.
2017-03-28 07:41:23 +02:00
Pierre-Yves David
a240451f10 tags: make argument 'tagtype' optional in '_updatetags'
This is the next step from the previous changesets, we are now ready to use this
function in a simpler way.
2017-03-28 07:39:10 +02:00
Pierre-Yves David
6c2cdbcd03 tags: reorder argument of '_updatetags'
We move all arguments related to tagtype to the end, together. This will allow
us to make these arguments optional and reuse of this logic for callers that do
not care about the tag types.
2017-03-28 07:38:10 +02:00
Pierre-Yves David
8da79dae5a tags: do not feed dictionaries to 'findglobaltags'
The code asserts that these dictionary are empty. So we can be more explicit
and have the function return the dictionaries directly.
2017-03-28 06:13:49 +02:00
Pierre-Yves David
10d5c16583 tags: extract fnode retrieval into its own function
My main goal here is to be able to reuse this logic easily. As a side effect
this important logic is now insulated and the code is clearer.
2017-03-28 06:01:31 +02:00
Denis Laxalde
6af640ec03 hgweb: fix diff hunks filtering by line range in webutil.diffs()
The previous clause for filter out a diff hunk was too restrictive. We need to
consider the following cases (assuming linerange=(lb, ub) and the @s2,l2
hunkrange):

            <-(s2)--------(s2+l2)->
      <-(lb)---(ub)->
               <-(lb)---(ub)->
                           <-(lb)---(ub)->

previously on the first and last situations were considered.

In test-hgweb-filelog.t, add a couple of lines at the beginning of file "b" so
that the line range we will follow does not start at the beginning of file.
This covers the change in aforementioned diff hunk filter clause.
2017-03-29 12:07:07 +02:00
Denis Laxalde
f25fa2b8d5 summary: display obsolete state of parents
Extend the "parent: " lines in summary to display "(obsolete)" when the parent
is obsolete.
2017-03-25 11:30:08 +01:00
Denis Laxalde
aaf9382123 templates: add "changeset.obsolete" label in command line style
Following respective change in cmdutil.changeset_printer.
2017-03-25 10:40:29 +01:00
Denis Laxalde
f5bac903ae templates: shorten definition of changeset labels in command-line style
We'll add more labels and the line is already quite long, so let's define a
variable to hold all evolution "troubles" labels.
2017-03-28 22:38:45 +02:00
Denis Laxalde
3a0bd7f34e templates: use separate() to build changeset labels in command-line style 2017-03-28 22:36:22 +02:00
Denis Laxalde
5f616436cb templatekw: add an "obsolete" keyword
Definition is the same as the one in evolve extension.
2017-03-25 10:34:11 +01:00
Denis Laxalde
2ae8148594 cmdutil: add a "changeset.obsolete" label in changeset_printer
Until now there were no label to highlight obsolete changesets in log output,
only evolution troubles (unstable, bumped, divergent) are supported. We add a
"changeset.obsolete" label on changeset entries produced by changeset_printer
so that obsolete changesets can be highlighted in log output. This is useful
because, unless using a graph log where obsolete changesets have a 'x' marker,
there's no way to identify obsolete changesets. And even in graph mode, when
working directory's parent is obsolete, we get a '@' marker and we do not see
it as obsolete.
2017-03-25 09:39:07 +01:00
Gregory Szorc
df777290b9 fileset: perform membership test against set for status queries
Previously, fileset functions operating on status items performed
membership tests against a list of items. When there are thousands
of items having a specific status, that test can be extremely
slow. Changing the membership test to a set makes this operation
substantially faster.

On the mozilla-central repo:

$ hg files -r d14cac631ecc 'set:added()'
before: 28.120s
after:   0.860s

$ hg status --change d14cac631ecc --added
0.690s
2017-03-28 14:40:13 -07:00
David Soria Parra
3e94bf58b6 worker: flush ui buffers before running the worker
a91c6275 introduces flushing ui buffers after a worker finished. If the ui was
not flushed before the worker was started, fork will copy the existing buffers
to the worker. This causes messages issued before the worker started to be
written to the terminal for each worker.

We are now flushing the ui before we start a worker and add an appropriate test
which will fail before this patch.
2017-03-28 10:21:38 -07:00
Jun Wu
49dddd702d chgserver: do not copy configs set by environment variables
Config set by environment variables have a source like "$ENVNAME". They
should not be copied because they will be recalculated by
rcutil.rccomponents.
2017-03-28 08:40:12 -07:00
Jun Wu
27200c179a rcutil: extract duplicated logic to a lambda
This simplifies the code a bit.
2017-03-28 07:57:56 -07:00
Jun Wu
97882c200c rcutil: unindent a block
Since global _rccomponents is gone, the code could be simplified.
2017-03-28 07:55:32 -07:00
Jun Wu
d209e31ef0 rcutil: do not cache rccomponents
The function is only called once except for "hg debugconfig", where it is
called twice. So there is no need to cache it.

Caching it will cause issues with chgserver. Instead of dropping the cache
in chgserver, it seems cleaner to just avoid the cache.
2017-03-28 07:54:00 -07:00
Matt Harbison
ac12c54058 ui: rerun color.setup() once the pager has spawned to honor 'color.pagermode'
Otherwise, ui.pageractive is False when color is setup in dispatch.py (without
--pager=on), and this config option is ignored.
2017-03-25 19:17:11 -04:00
Matt Harbison
ca66dceee3 ui: defer setting pager related properties until the pager has spawned
When --pager=on is given, dispatch.py spawns a pager before setting up color.
If the pager failed to launch, ui.pageractive was left set to True, so color
configured itself based on 'color.pagermode'.  A typical MSYS setting would be
'color.mode=auto, color.pagermode=ansi'.  In the failure case, this would print
a warning, disable the pager, and then print the raw ANSI codes to the terminal.

Care needs to be taken, because it appears that leaving ui.pageractive=True was
the only thing that prevented an attempt at running the pager again from inside
the command.  This results in a double warning message, so pager is simply
disabled on failure.

The ui config settings didn't need to be moved to fix this, but it seemed like
the right thing to do for consistency.
2017-03-25 21:12:00 -04:00
Matt Harbison
75ecbbd369 color: stop mutating the default effects map
A future change will make color.setup() callable a second time when the pager is
spawned, in order to honor the 'color.pagermode' setting.  The problem was that
when 'color.mode=auto' was resolved to 'win32' in the first pass, the default
ANSI effects were overwritten, making it impossible to honor 'pagermode=ansi'.
Also, the two separate maps didn't have the same keys.  The symmetric difference
is 'dim' and 'italic' (from ANSI), and 'bold_background' (from win32).  Thus,
the update left entries that didn't belong for the current mode.  This bled
through `hg debugcolor`, where the unsupported ANSI keys were listed in 'win32'
mode.

As an added bonus, this now correctly enables color with MSYS `less` for a
command like this, where pager is forced on:

    $ hg log --config color.pagermode=ansi --pager=yes --color=auto

Previously, the output was corrupted.  The raw output, as seen through the ANSI
blind `more.com` was:

    <-[-1;6mchangeset:   34840:3580d1197af9<-[-1m
    ...

which MSYS `less -FRX` rendered as:

    1;6mchangeset:   34840:3580d1197af91m
    ...

(The two '<-' instances were actually an arrow character that TortoiseHg warned
couldn't be encoded, and notepad++ translated to a single '?'.)

Returning an empty map for 'ui._colormode == None' seems better that defaulting
to '_effects' (since some keys are mode dependent), and is better than None,
which blows up `hg debugcolor --color=never`.
2017-03-25 13:50:17 -04:00
Jun Wu
94b5d9fcfa pager: do not read from environment variable
$PAGER is converted to the pager.pager config item. So it's no longer
necessary to read $PAGER in ui.pager().
2017-03-26 21:43:47 -07:00
Jun Wu
371e8cfb4e ui: simplify geteditor
Now $EDITOR and $VISUAL will affect ui.editor directly. So it's no longer
necessary to test them in ui.geteditor.
2017-03-26 21:41:42 -07:00
Jun Wu
28aef1acdc debugconfig: list environment variables in debug output
Since we print "read config from" for config files, printing environment
variables will make it more consistent.
2017-03-26 21:40:22 -07:00
Jun Wu
cc0447168c rcutil: let environ override system configs (BC)
This is BC because system configs won't be able to override $EDITOR, $PAGER.
The new behavior is arguably more rational.
2017-03-26 21:33:37 -07:00
Jun Wu
4fdac40b3b rcutil: add a method to convert environment variables to config items
Handling config and environ priorities has been messy. Partially because we
don't have config layers - you either get all configs (sys + user), or none.

Ideally, environ like $EDITOR, $PAGER should be able to override the system
configs "ui.editor", "pager.pager". This patch provides the ability to
convert them into config items, so they can be inserted into the middle
config layer between system rc and user rc.
2017-03-26 21:27:02 -07:00
Jun Wu
d4692f9619 rcutil: let rccomponents return different types of configs (API)
The next patches will convert environ to raw config items, and insert the
config items between systemrcpath and userrcpath. This patch teaches
rccomponents to return the type information so the caller could distinguish
between "path" and raw config "items".
2017-03-26 21:04:29 -07:00
Jun Wu
d63b24bd71 rcutil: rename rcpath to rccomponents (API) 2017-03-26 20:48:00 -07:00
Jun Wu
d3c5fd327d rcutil: extract rc directory listing logic
The logic of listing a ".rc" directory is duplicated in two functions,
extract it to a single function to make the code cleaner.
2017-03-26 20:46:05 -07:00
Jun Wu
8c4790a38b rcutil: split osrcpath to return default.d paths (API)
After this change, there are 3 rcpath functions:

  - defaultrcpath
  - systemrcpath
  - userrcpath

This will allow us to insert another config layer in the middle.
2017-03-26 20:21:32 -07:00
Jun Wu
582704c32f rcutil: move scmutil.*rcpath to rcutil (API)
As discussed at [1], the logic around "actual config"s seem to be
non-trivial enough that it's worth a new module.

This patch creates the module and move "scmutil.*rcpath" functions there as
the first step. More methods will be moved to the module in the future.

The module is different from config.py because the latter only cares about
data structure and parsing, and does not care about special case, or system
config paths, or environment variables.

[1]: https://www.mercurial-scm.org/pipermail/mercurial-devel/2017-March/095503.html
2017-03-26 20:18:42 -07:00
Yuya Nishihara
91f8f53f3a statfs: make getfstype() raise OSError
It's better for getfstype() function to not suppress an error. Callers can
handle it as necessary. Now "hg debugfsinfo" will report OSError.
2017-03-25 17:25:23 +09:00
Yuya Nishihara
91e4f106c6 statfs: rename pygetfstype to getfstype
There's no ambiguity now.
2017-03-25 17:24:11 +09:00
Yuya Nishihara
413bbbf814 statfs: refactor inner function as a mapping from statfs to string
The major difference between BSD and Linux is how to get a fstype string.
Let's split the longest part of getfstype() as a pure function.
2017-03-25 17:23:21 +09:00
Yuya Nishihara
6b9644999c statfs: simplify handling of return value
Py_BuildValue() can translate NULL pointer to None.
2017-03-25 17:13:12 +09:00
Pierre-Yves David
368236438f tags: deprecated 'repo.tag'
All user are gone. We can now celebrate the removal of some extra line from the
'localrepo' class.
2017-03-27 16:00:47 +02:00
Pierre-Yves David
b55b067aa6 tags: use the 'tag' function from the 'tags' module in the 'tag' command
There is No need to go through the 'repo' object method anymore.
2017-03-27 16:00:34 +02:00
Pierre-Yves David
8141af4eed tags: move 'repo.tag' in the 'tags' module
Similar logic, pretty much nobody use this method (that creates a tag) so we
move it into the 'tags' module were it belong.
2017-03-27 15:58:31 +02:00
Pierre-Yves David
f1cbb59e75 tags: move '_tags' from 'repo' to 'tags' module
As far as I understand, that function do not needs to be on the local repository
class, so we extract it in the 'tags' module were it will be nice and
comfortable. We keep the '_' in the name since its only user will follow in the
next changeset.
2017-03-27 15:55:07 +02:00
Denis Laxalde
2702418496 hgweb: filter diff hunks when 'linerange' and 'patch' are specified in filelog 2017-03-13 15:17:20 +01:00
Denis Laxalde
69dcb458cd hgweb: add a 'linerange' parameter to webutil.diffs()
This is used to filter out hunks based on their range (with respect to 'node2'
for patch.diffhunks() call, i.e. 'ctx' for webutil.diffs()).

This is the simplest way to filter diff hunks, here done on server side. Later
on, it might be interesting to perform this filtering on client side and
expose a "toggle" action to alternate between full and filtered diff.
2017-03-13 15:15:49 +01:00
Denis Laxalde
996bd4af95 hgweb: handle a "linerange" request parameter in filelog command
We now handle a "linerange" URL query parameter to filter filelog using
a logic similar to followlines() revset.
The URL syntax is: log/<rev>/<file>?linerange=<fromline>:<toline>
As a result, filelog entries only consists of revision changing specified
line range.

The linerange information is propagated to "more"/"less" navigation links but
not to numeric navigation links as this would apparently require a dedicated
"revnav" class.

Only update the "paper" template in this patch.
2017-01-19 17:41:00 +01:00
Jun Wu
faefa92b13 metadataonlyctx: speed up sanity check
Previously the sanity check will construct manifestctx for both p1 and p2.
But it only needs the "manifest node" information, which could be read from
changelog directly.
2017-03-26 12:26:35 -07:00
Denis Laxalde
a70f2fcec7 revset: factor out linerange processing into a utility function
Similar processing will be done in hgweb.webutil in forthcoming changeset.
2017-02-24 18:39:08 +01:00
Denis Laxalde
6b9779860f hgweb: add a "patch" query parameter to filelog command
Add support for a "patch" query parameter in filelog web command similar to
--patch option of `hg log` to display the diff of each changeset in the table
of revisions. The diff text is displayed in a dedicated row of the table that
follows the existing one for each entry and spans over all columns. Only
update "paper" template in this patch.
2017-03-13 10:41:13 +01:00
Denis Laxalde
53e1237344 hgweb: handle "parity" internally in webutil.diffs()
There's apparently no reason to have the "parity" of diff blocks that
webutil.diffs() generates coming from outside the function. So have it
internally managed. We thus now pass a "web" object to webutil.diffs() to get
access to both "repo" and "stripecount" attribute.
2017-03-13 10:40:19 +01:00
Yuya Nishihara
6286308d03 py3: fix manifestdict.fastdelta() to be compatible with memoryview
This doesn't look nice, but a straightforward way to support Python 3.
bytes(m[start:end]) is needed because a memoryview doesn't support ordering
operations. On Python 2, m[start:end] returns a bytes object even if m is
a buffer, so calling bytes() should involve no additional copy.

I'm tired of trying cleaner alternatives, including:

 a. extend memoryview to be compatible with buffer type
    => memoryview is not an acceptable base type
 b. wrap memoryview by buffer-like class
    => zlib complains it isn't bytes-like
2017-03-26 19:06:48 +09:00
Jun Wu
6b0a252616 crecord: use ProgrammingError 2017-03-26 17:00:23 -07:00