Previously, changes to the configuration would not be picked up by a
running server. That feels like a bug. Regenerate the web substitutions
table when the repository changes.
Previously, we were using the last mtime we saw when reporting the
HTTP cache etag. When we added bookmarks to the end of the list of
files checked, unchanged or missing bookmarks would keep the client
cache from being invalidated.
Long ago we disabled trust of the templates path with a comment
describing the (insecure) behavior before the change. At some later
refactor, the code was apparently changed back to match the comment,
unaware that the intent of the comment was to describe the behavior to
avoid.
This change disables the trust and updates the comment to explicitly
say not only what the old problem was, but also that it was in fact a
problem and the action taken to prevent it.
Impact: prior to this change, if you had a UNIX-based hgweb server
where users can write hgrc files, those users could potentially read
any file readable by the web server.
This is marked as a backwards compatibility issue because people may
have configured templates without proper trust settings. Issue spotted
by Greg Szorc.
This does change behavior in that the templatepath could change during
the lifetime of the server. But everything else can change, I don't see
why template paths can't.
We want refresh() to only be about refreshing repository
instances. This state doesn't belong in requestcontext
because it is shared across multiple threads.
The initial value is irrelevant since refresh() compares it to
a tuple of tuples of file mtime and size. None != tuple and
None is a better default value than a tuple containing irrelevant
values.
As part of this, "archive_specs" was renamed to "archivespecs" to align
with naming conventions.
"archive_specs" didn't technically need to be moved from hgweb. But it
seemed to make sense to have all the archive code in the same class.
As part of this, hgweb.configlist is no longer used, so it was deleted.
Various config options from the repository were stored on the
hgweb instance. While unlikely, there could be race conditions between
a new request updating these values and an in-flight request seeing both
old and new values, leading to weird results.
We move some of options/attributes from hgweb to requestcontext.
As part of this, we establish config* helpers on requestcontext. As
part of the move, we changed int() casts to configint() calls. The
int() usage likely predates the existence of configint().
We also removed config option updating from once every refresh to every
request. I don't believe obtaining config options is expensive enough to
warrant only doing when the repository has changed.
The excessive use of object.__setattr__ is unfortunate. But it will
eventually disappear once the proxy is no longer necessary.
Currently, hgweb applications have many instance variables holding
mutated state. This is somewhat problematic because multiple threads
may race accessing or changing this state.
This patch starts a series that will add more thread safety to
hgweb applications. It will do this by moving mutated state out
of hgweb and into per-request instances of the newly established
"requestcontext" class.
Our new class currently behaves like a proxy to hgweb instances. This
should change once all state is captured in it instead of hgweb. The
effectiveness of this proxy is demonstrated by passing instances of
it - not hgweb instances/self - to various functions.
The "request" argument has been optional in this code since
it was introduced in bb95879961db in 2009. There are no consumers that
don't pass this argument. So don't make it an optional argument.
It took longer than I wanted to grok how the various parts of hgweb
worked. So I added some class and method documentation to help whoever
hacks on this next.
Tags and bookmarks on summary page are already limited to 10, let's limit
branches as well.
Each of the blocks (tags, bookmarks, branches) currently has a link to the full
list.
This allows showing correct status for each branch, which was missing on
/summary. Usually that means that closed branches get the same css class
(resulting in e.g. different color/shade) as they do on /branches page.
The sorting of the branches on summary page also changes and is now the same as
on /branches page: closed branches are now at the end of the list.
hgwebdir refreshes the set of known repositories periodically. This
is necessary because refreshing on every request could add significant
request latency.
More than once I've found myself wanting to tweak this interval at
Mozilla. I've also wanted the ability to always refresh (often when
writing tests for our replication setup).
This patch makes the refresh interval configurable. Negative values
indicate to always refresh. The default is left unchanged.
There needs to be a way to escape symbolic revisions containing forward
slashes, but urlescape filter doesn't escape slashes at all (in fact, it is
used in places where forward slashes must be preserved).
The filter considers @ to be safe just for bookmarks like @ and @default to
look good in urls.
It's possible to have a branch/tag/bookmark with all kinds of special
characters, such as {}/\!?. While not very conveniently, symbolic revisions
with such characters work from command line if user correctly quotes the
characters. These characters also work in hgweb, when they are properly
encoded, with one exception: '/' (forward slash, urlencoded as '%2F'), which
was getting decoded before hgweb could parse it as a part of PATH_INFO.
Because of that, hgweb was seeing it as any other forward slash, that is, as
just another url parts separator.
For example, if user wanted to see the content of dir/file at bookmark
'feature/eggs', url could be: '/file/feature%2Feggs/dir/file'. But hgweb tried
to find a revision 'feature' and get contents of 'eggs/dir/file'.
To fix this, let's assume forward slashes are doubly-urlencoded (%252F), so
CGI/WSGI server decodes it into %2F. Then we can decode %2F in the revision
part of the url into an actual '/' character.
Making hgweb produce such urls will be done in the next 2 patches.
This makes changes to bookmarks visible to hgweb through the official
way. There is no change to tests because there is currently another
hack in place to ensure the same behavior.
The refresh feature was explicitly testing if '00changelog.i' and 'phaseroots'
changed. This is overlooking other important information like bookmarks and
obsstore (bookmark have their own hack to work around it).
We move to a more extensible system with a list of files of interest
that will be used to build the repo state. The system should probably
move into a more central place so that the command server and other
systems are able to use it. Extension writers will also be able to add
entries to ensure that changes to extension data are properly detected.
Also the current key (mtime, size) is notably weak for bookmarks and phases
whose files can easily change content without effect on their size.
Still, this patch seems like a valuable minimal first step.
Python 2.6 introduced the "except type as instance" syntax, replacing
the "except type, instance" syntax that came before. Python 3 dropped
support for the latter syntax. Since we no longer support Python 2.4 or
2.5, we have no need to continue supporting the "except type, instance".
This patch mass rewrites the exception syntax to be Python 2.6+ and
Python 3 compatible.
This patch was produced by running `2to3 -f except -w -n .`.
One of the features of hgweb is that current position in repo history is
remembered between separate requests. That is, links from /rev/<node_hash> lead
to /file/<node_hash> or /log/<node_hash>, so it's easy to dig deep into the
history. However, such links could only use node hashes and local revision
numbers, so while staying at one exact revision is easy, staying on top of the
changes is not, because hashes presumably can't change (local revision numbers
can, but probably not in a way you'd find useful for navigating).
So while you could use 'tip' or 'default' in a url, links on that page would be
permanent. This is not always desired (think /rev/tip or /graph/stable or
/log/@) and is sometimes just confusing (i.e. /log/<not the tip hash>, when
recent history is not displayed). And if user changed url deliberately to say
default instead of <some node hash>, the page ignores that fact and uses node
hash in its links, which means that navigation is, in a way, broken.
This new property, symrev, is used for storing current revision the way it was
specified, so then templates can use it in links and thus "not dereference" the
symbolic revision. It is an additional way to produce links, so not every link
needs to drop {node|short} in favor of {symrev}, many will still use node hash
(log and filelog entries, annotate lines, etc).
Some pages (e.g. summary, tags) always use the tip changeset for their context,
in such cases symrev is set to 'tip'. This is needed in case the pages want to
provide archive links.
highlight extension needs to be updated, since _filerevision now takes an
additional positional argument (signature "web, req, tmpl" is used by most of
webcommands.py functions).
More references to symbolic revisions and related gripes: issue2296, issue2826,
issue3594, issue3634.
This documentation was mostly intended for the user helps. However given the
lack of request for such feature, we should keep it un-documented. We stick the
help text in the code as it could still be useful to fellow contributors.
Previously, if a subrepo parent had 'web.hidden=True' set, neither the parent
nor child had a repository entry. However, the directory entry for the parent
would be listed (it wouldn't have the fancy 'web.name' if configured), and that
link went to the repo's summary page, effectively making it not hidden.
This simply disables the directory processing if a valid repository is present.
Whether or not the subrepo should be hidden is debatable, but this leaves that
behavior unchanged (i.e. it stays hidden).
Previously, when 'web.name' was set on a subrepo parent and 'web.collapse=True',
the parent repo would show in the list with the configured 'web.name', and a
directory with the parent repo's filesystem name (with a trailing slash) would
also appear. The subrepo(s) would unexpectedly be excluded from the list of
repositories. Clicking the directory entry would go right to the repo page.
Now both the parent and the subrepos show up, without the additional directory
entry.
The configured hgweb paths used '**' for finding the repos in this scenario.
A couple of notes about the tests:
- The area where the subrepo was added has a comment that it tests subrepos,
though none previously existed there. One now does.
- The 'web.descend' option is required for collapse to work. I'm not sure what
the previous expectations were for the test. Nothing changed with it set,
prior to adding the code in this patch. It is however required for this test.
- The only output changes are for the hyperlinks, obviously because of the
'web.name' parameter.
- Without this code change, there would be an additional diff:
--- /usr/local/mercurial/tests/test-hgwebdir.t
+++ /usr/local/mercurial/tests/test-hgwebdir.t.err
@@ -951,7 +951,7 @@
/rcoll/notrepo/e/
/rcoll/notrepo/e/e2/
/rcoll/notrepo/f/
- /rcoll/notrepo/f/f2/
+ /rcoll/notrepo/f/
Test repositories inside intermediate directories
I'm not sure why the fancy name doesn't come out, but it is enough to
demonstrate that the parent is not listed redundantly, and the subrepo isn't
skipped.
revset.parse() should be responsible for all parsing errors. Perhaps it wasn't
because 'revset.parse' was not a real function when the validation code was
added at ac01134d0a40.
Before this patch, doc string of each web commands isn't extracted as
translatable one, even though web commands are listed up in "hg help
hgweb".
This patch adds "mercurial/hgweb/webcommands.py" on to arguments of
"i18n/hggettext". "i18nfunctions" added into "webcommands.py" is used
by "i18n/hggettext" to get the list of functions having translatable
doc string.
Surpringly, the templates didn't receive an unmodified version of the
line numbers. Expose it to make implementing the JSON templates easier.
In theory, we could post-process an existing template variable. But
extra string manipulation seems quite wasteful, especially on items that
could occur hundreds or even thousands of times in output.
It's pretty surprising phase wasn't part of this template call already.
We now expose {phase} to the {changeset} template and we expose this
data to JSON.
This brings JSON output in line with the output from `hg log -Tjson`.
The lone exception is hweb doesn't print the numeric rev. As has been
stated previously, I don't believe hgweb should be exposing these
unstable identifiers. (We can add them later if we really want them.)
There is still work to bring hgweb in parity with --verbose and
--debug output from the CLI.
Browse (or manifest) action allows browsing the directory structure at some
specified revision. In gitweb and monoblue styles, the revision header already
has branch/tag/bookmark information for the revision, but in paper style this
header was only showing tags. This patch adds branches and bookmarks.
Branch name needs to be obtained in this special way to be consistent with
regular changeset page, where in paper style default branch is never shown.
Although Python supports `X = Y if COND else Z`, this was only
introduced in Python 2.5. Since we have to support Python 2.4, it was
a very common thing to write instead `X = COND and Y or Z`, which is a
bit obscure at a glance. It requires some intricate knowledge of
Python to understand how to parse these one-liners.
We change instead all of these one-liners to 4-liners. This was
executed with the following perlism:
find -name "*.py" -exec perl -pi -e 's,(\s*)([\.\w]+) = \(?(\S+)\s+and\s+(\S*)\)?\s+or\s+(\S*)$,$1if $3:\n$1 $2 = $4\n$1else:\n$1 $2 = $5,' {} \;
I tweaked the following cases from the automatic Perl output:
prev = (parents and parents[0]) or nullid
port = (use_ssl and 443 or 80)
cwd = (pats and repo.getcwd()) or ''
rename = fctx and webutil.renamelink(fctx) or []
ctx = fctx and fctx or ctx
self.base = (mapfile and os.path.dirname(mapfile)) or ''
I also added some newlines wherever they seemd appropriate for readability
There are probably a few ersatz ternary operators still in the code
somewhere, lurking away from the power of a simple regex.
Similar in spirit to df435ff00d29, I want to write an extension to
make available extra template keywords so hgweb templates can include
extra data.
To do this today requires monkeypatching the templater, which I think is
the wrong place to perform this modification.
This patch extracts the creation of the templater arguments to a
standalone function - one that can be monkeypatched by extensions.
I would very much like for extensions to be able to inject extra
templater parameters into *any* template. However, I'm not sure the best
way to facilitate this, as hgweb commands invoke the templater before
returning and we want the extensions to have access to rich data
structures like the context instances. We need cooperation inside hgweb
command functions. The use case screams for something like internal-only
"hooks." This is exactly what my (rejected) "events" patch series
provided. Perhaps that feature should be reconsidered...
The issue is titled "filtered revision 'XXX' (not in 'served' subset)" and that
is the error message you sometimes get when trying to look at a file (/file or
/annotate) in hgweb. For example:
http://hg.intevation.org/mercurial/crew/file/8414f8487b33/mercurial/cmdutil.py
This happens when a parent revision for a file is hidden, thus it is
not 'served' and isn't accessible in hgweb by default. When hgweb tries to
access such changeset, it produces the error and HTTP status code 404.
Another detail is that the parents() function, that is used in multiple places
in hgweb, sometimes returned changesets that were obsoleted by the current
changeset for the file. For example, when using rebase with evolve and rebasing
a divergent changeset that introduces a file on top of current branch. Or
grafting a change and making the new grafted changeset obsolete the source
(shown in the test case). The result is the same - the obsoleted changeset was
mistakingly returned from parents(), even though it's not a parent and the only
link to the new changeset is an obsoletion marker (and rebase/graft metadata?
not sure it matters).
The problem is fixed by using introrev() instead of linkrev() for finding
parents. This prevents parents() function from returning unrelated obsolete
changesets.
The test case prepares a separate repo because (afaict) all other test cases
never reuse file names, so there are no files that were changed in multiple
changesets. So no previously available files have obsolete changesets in their
history.
This change is intended to avoid exposing the implementation detail to
callers. I'm going to extend fullreposet to support "null" revision, so
these mfunc calls will have to use fullreposet() instead of spanset().
A subsequent patch will introduce an import cycle between mercurial.help
and mercurial.hgweb.webcommands. Break the cycle by moving the import of
mercurial.help into the web command that actually needs it.
I want to supplement changelist entries (used by shortlog and changelog
endpoints) with custom metadata from an extension. i.e. I have extra
per-changeset metadata that I wish to make available to templates so it
can be rendered on hgweb.
To facilitate this, I've extracted the logic for creating a changeset
data structure into its own function, where it can be wrapped by
extensions.
Ideally, hgweb would use the same templater as the command line and have
full access to templatekw.keywords. But that's a lot of work. This patch
gets us some of the benefit without all the work.
Many other hgweb commands could benefit from similar refactorings. I'm
going to wait to see how this patch is received before I go crazy on
extracting inline functions.
This adds UI portion of the feature that has resided in mercurial since 2012.
Back then the interface was added together with the code, but was shortly
backed out because it was deemed "not ready". Code, however, stayed.
For the original feature and its implementation, see issue2810 and
3ff83729b63f.
In short, the backed-out interface had two outstanding issues:
1. it was introducing an entirely new term (baseline) and
2. it was present on every changeset's page, even for changesets with 1 parent
(or no parents), which didn't make sense
This patch implements a hopefully better interface because:
1. it uses the usual terms (diff) and
2. it only shows up when there actually are 2 parents.
This patch fixes a bug where hgweb would send an incomplete HTTP
response.
If an uncaught exception is raised when hgweb is processing a request,
hgweb attempts to send a generic error response and log that exception.
The server defaults to chunked transfer coding. If an uncaught exception
occurred, it was sending the error response string / chunk properly.
However, RFC 7230 Section 4.1 mandates a 0 size last chunk be sent to
indicate end of the entity body. hgweb was failing to send this last
chunk. As a result, properly written HTTP clients would assume more data
was coming and they would likely time out waiting for another chunk to
arrive.
Mercurial's own test harness was paving over the improper HTTP behavior
by not attempting to read the response body if the status code was 500.
This incorrect workaround was added in faced8f5c2af and has been removed
with this patch.
A matcher is required when enabling the subrepo option on archival.archive(),
because that calls match.narrowmatcher(), which accesses fields on the object.
It's therefore probably a bad idea to default the matcher to None on archive(),
but that's a fix for default.
Make hgweb.refresh() also look at phaseroots file (in addition to 00changelog.i
file) and reload the repo when os.stat returns different mtime or size than
cached, signifying the file was modified.
This way if user changes phase of a changeset (secret <-> draft), there's no
need to restart hg serve to see the change.
Traditionally, the way to specify a command for hgweb was to use url query
arguments (e.g. "?cmd=batch"). If the command is unknown to hgweb, it gives an
error (e.g. "400 no such method: badcmd").
But there's also another way to specify a command: as a url path fragment (e.g.
"/graph"). Before, hgweb was made forgiving (looks like it was made in
cd356f4efd91) and user could put any unknown command in the url. If hgweb
couldn't understand it, it would just silently fall back to the default
command, which depends on the actual style (e.g. for paper it's shortlog, for
monoblue it's summary). This was inconsistent and was breaking some tools that
rely on http status codes (as noted in the issue4071). So this patch changes
that behavior to the more consistent one, i.e. hgweb will now return "400 no
such method: badcmd".
So if some tool was relying on having an invalid command return http status
code 200 and also have some information, then it will stop working. That is, if
somebody typed foobar when they really meant shortlog (and the user was lucky
enough to choose a style where the default command is shortlog too), that fact
will now be revealed.
Code-wise, the changed if block is only relevant when there's no "?cmd" query
parameter (i.e. only when command is specified as a url path fragment), and
looks like the removed else branch was there only for falling back to default
command. With that removed, the rest of the code works as expected: it looks at
the command, and if it's not known, raises a proper ErrorResponse exception
with an appropriate message.
Evidently, there were no tests that required the old behavior. But, frankly, I
don't know any way to tell if anyone actually exploited such forgiving behavior
in some in-house tool.
hgweb detects out-of-date repository instances (using a highly
suspect mechanism that should probably be fixed) and obtains a new
repository object if needed.
This patch changes the repository object copy to use the repo URL
(instead of path). This preserves more information about the source
repository and allows bundles to be served through hgweb.
A test verifying that bundles can now be served properly via
`hg serve` has been added.
_ is usually used for i18n markup but we also used it for I-don't-care
variables.
Instead, name don't-care variables in a slightly descriptive way but use the _
prefix to designate unused variable.
This will mute some pyflakes "import '_' ... shadowed by loop variable"
warnings.
Before this patch, revision numbers and hash values in "comparison"
page are gotten from not changelog but filelog.
Such filelog information is useful only for hgweb debugging, and may
confuse users.
This patch shows revision numbers and hash values gotten from
changelog in "comparison" page.
Before this patch, "parents" in pages for file doesn't show as same
parents as "hg parents -r REV FILE", when the specified file is not
modified in the specified revision.
For example, it is assumed that revision A, B and D change file "f".
changelog (A) ---> (B) ---> (C) ---> (D)
filelog "f" (x) ---> (y) ------------> (z)
"/file/D/f" invokes "webutil.parents()" with filectx(z) gotten from
changectx(D), and it returns changectx(B). This is as same result as
"hg parents -r D f".
In the other hand, "/file/C/f" invokes "webutil.parents()" with
filectx(y') gotten from changectx(C), and it returns changectx(A),
because filectx(y') is linked to changectx(B), and works like
filectx(y) in some cases.
In this case, revision B is hidden from users browsing file "f" in
revision C.
This patch shows as same parents as "hg parents -r REV FILE" in pages
for file, by making "webutil.parents()" return:
- "linkrev()"-ed revision only, if:
- specified context instance is "filectx" (because
"webutil.parents()" is invoked with changectx, too), and
- (1) the revision from which filectx is gotten and (2) the one to
which filectx is linked are different from each other
- revision gotten from "ctx.parents()", otherwise