hgweb currently offers limited functionality for "classifying"
repositories. This patch aims to change that.
The web.labels config option list is introduced. Its values
are exposed to the "index" and "summary" templates. Custom
templates can use template features like ifcontains() to e.g.
look for the presence of a specific label and engage specific
behavior. For example, a site operator may wish to assign a
"defunct" label to a repository so the repository is prominently
marked as dead in repository indexes.
I.e. when a revision blames a block of source lines, only display the
revision link on the first line of the block (this is identified by the
"blockhead" key in annotate context).
This addresses item "Visual grouping of changesets" of the blame improvements
plan (https://www.mercurial-scm.org/wiki/BlamePlan) which states: "Typically
there are block of lines all attributed to the same revision. Instead of
rendering the revision/changeset for every line, we could only render it once
per block."
Change summary webcommand to yield each element of the shortlog instead of the
entire list.
This makes generated json more readable since each entry can be formatted
separately, instead of returning all the shortlog content in a single string.
New frommapfile() function will make it clear when template aliases will be
loaded. They should be applied to command arguments and templates in hgrc,
but not to map files. Otherwise, our stock styles and web templates
(i.e map-file templates) could be modified unintentionally.
Future patches will add "aliases" argument to __init__(), but not to
frommapfile().
Changes, branches and tags are already in revlog order on /summary, /branches
and /tags, let's now make bookmarks be sorted by the same principle. It's more
helpful to show more "recent" bookmarks on top. This will affect /bookmarks
page in all styles, including atom, rss and raw, and also /summary page.
Bookmarks are sorted using a (revision number, bookmark name) tuple.
Let's do the same thing that /tags page does. It gets sorted tags and then if
it needs the latest only, it just slices the first item from the list. Since
it's a slice and not a min(), it doesn't throw an exception if the list is
empty. This fixes HTTP 500 error from issue5022.
Entries prepared in webutil.changelistentry() skip showing parents in the
trivial case when there's only one parent and it's the previous revision. This
doesn't work well for the json-log template, which is supposed to just dump raw
data in an easy-to-parse format, so let's provide all parents as another
keyword: allparents.
Using a lambda function here means that the performance of templates that don't
use allparents won't be affected (see 88bd6697bfad).
narrowhg (for its narrow spec) and remotefilelog (for its large batch
requests) would like to be able to make requests with argument sets so
absurdly large that they blow out total request size limit on some
http servers. As a workaround, support stuffing args at the start
of the POST body.
We will probably want to leave this behavior off by default in servers
forever, because it makes the old "POSTs are only for writes"
assumption wrong, which might break some of the simpler authentication
configurations.
This is necessary to preserve filename encoding over JSON. Instead, this
patch inserts "|utf8" where non-ascii local-encoding texts can be passed
to "|json".
See also the commit that introduced "utf8" filter.
Because future patches will change "|json" filter to handle input bytes
transparently, i.e. use UTF-8b encoding, "{jsdata}" must keep data in UTF-8
bytes, whereas "{nodes}" are text.
This patch inserts encodestr() where localstr is likely to survive.
The new function is used to fill basic information about a ctx, such as
revision number and hash, author, commit message, etc. Before, every webcommand
used to get this basic information on its own using some boilerplate code, and
some things in some places just weren't available.
Before this patch, it was unclear why the httpservice object could read the
server options (e.g. --port) from 'ui'. It just worked because repo.ui is ui.
Since createservice() was moved to hgweb and hgweb imports both hgweb_mod and
hgwebdir_mod, we no longer have to force hgweb() function to select one of
them by the type of 'o' variable. Let's be explicit!
This patch does not change hgweb() function because it is the interface of
existing WSGI and CGI scripts.
A block of code above this one already says "if fctx is not None", and it's
also what this code actually intends to check, so let's be specific as PEP-8
recommends.
c0ebd60607e9 didn't remove it, let's do it now.
Placing the added lines into the already existing "if fctx is not None" block
also makes webcommands.comparison() look a bit more like
webcommands.filediff(), which eases possible future refactoring. And fctx is
not None only when path in ctx, so logically it's equivalent.
When comparing a file that was removed at the current revision, parents used to
show grandparents instead, due to how fctx was "shifted" from the current
revision to its p1. Let's not do that.
The fix is pretty much copied from webcommands.filediff().
The next patch will merge the cmdutil.service() calls of both commandserver
and hgweb. Before doing it, this patch wipes out the code specific to hgweb
from commands.serve().
_siblings is a helper that is used for displaying changeset parents and
children in hgweb. Before, when it was a simple generator, it couldn't tell its
length without being consumed, and that required a special case when preparing
data for changeset template (see 3468fd599ef4).
Let's make it into a class (similar to templatekw._hybrid) that allows len(...)
without side-effects.
Log pages, i.e. changelog, filelog and search results page computed children
and parents for each changeset shown, because spartan hgweb style shows this
info. Turns out, computing all this is heavy and also unnecessary for log pages
in all other hgweb styles.
Luckily, templates allow an easy way to do computations on demand: just pass
the heavy part of code as a callable and it will be only called when needed.
Here are some benchmarks on the mercurial repository (best of 3):
time wget http://127.0.0.1:8021/
before: 0m0.050s
after: 0m0.040s
time wget http://127.0.0.1:8021/?revcount=960
before: 0m1.164s
after: 0m0.389s
time wget http://127.0.0.1:8021/log/tip/mercurial/commands.py
before: 0m0.047s
after: 0m0.042s
time wget http://127.0.0.1:8021/log/tip/mercurial/commands.py?revcount=960
before: 0m0.830s
after: 0m0.434s
Since Python 2.5 str has new methods: partition and rpartition. They are more
specialized than the usual split and rsplit, and they sometimes convey the
intent of code better and also are a bit faster (faster than split/rsplit with
maxsplit specified). Let's use them in appropriate places for a small speedup.
Example performance (partition):
$ python -m timeit 'assert "apple|orange|banana".split("|")[0] == "apple"'
1000000 loops, best of 3: 0.376 usec per loop
$ python -m timeit 'assert "apple|orange|banana".split("|", 1)[0] == "apple"'
1000000 loops, best of 3: 0.327 usec per loop
$ python -m timeit 'assert "apple|orange|banana".partition("|")[0] == "apple"'
1000000 loops, best of 3: 0.214 usec per loop
Example performance (rpartition):
$ python -m timeit 'assert "apple|orange|banana".rsplit("|")[-1] == "banana"'
1000000 loops, best of 3: 0.372 usec per loop
$ python -m timeit 'assert "apple|orange|banana".rsplit("|", 1)[-1] == "banana"'
1000000 loops, best of 3: 0.332 usec per loop
$ python -m timeit 'assert "apple|orange|banana".rpartition("|")[-1] == "banana"'
1000000 loops, best of 3: 0.219 usec per loop
The home of 'Abort' is 'error' not 'util' however, a lot of code seems to be
confused about that and gives all the credit to 'util' instead of the
hardworking 'error'. In a spirit of equity, we break the cycle of injustice and
give back to 'error' the respect it deserves. And screw that 'util' poser.
For great justice.
It's useless to handle file patterns as relative to the cwd of the server
process. The only sensible way in hgweb is to resolve paths relative to the
repository root.
It seems dirstate.getcwd() isn't used to get a real file path, so this patch
won't cause problem.
If code inside a context manager returns a generator, the context
manager exits before the generator is iterated.
hgweb was using a context manager to control thread safe access to a
localrepository instance. But it was returning a generator, so there was
a race condition between a previous request streaming a response to the
client and a new request obtaining the released but in use repository.
By iterating the generator inside the context manager, we ensure we
don't release the repo instance until after the response has finished.
With this change, hgweb finally appears to have full localrepository
isolation between threads. I can no longer reproduce the 2 exceptions
reported in issue4756.
test-hgweb-non-interactive.t has been modified to consume the output
of calling into a WSGI application. Without this, execution of the WSGI
application stalls because of the added yield statement.
cachedlocalrepo.copy() didn't actually create new localrepository
instances. This meant that the new thread isolation code in hgweb wasn't
actually using separate localrepository instances, even though it was
properly using separate cachedlocalrepo instances.
Because the behavior of the API changed, the single caller in hgweb had
to be refactored to always call _webifyrepo() or it may not have used
the proper filter.
I confirmed via print() debugging that id(repo) is in fact different on
each thread. This was not the case before.
For reasons I can't yet explain, this does not fix issue4756. I suspect
there is shared cache somewhere that isn't thread safe.
Before this change, multiple threads/requests could share a
localrepository instance. This meant that all of localrepository needed
to be thread safe. Many bugs have been reported telling us that
localrepository isn't actually thread safe.
While making localrepository thread safe is a noble cause, it is a lot
of work. And there is little gain from doing so. Due to Python's GIL,
only 1 thread may be processing Python code at a time. The benefits
to multi-threaded servers are marginal.
Thread safety would be a lot of work for little gain. So, we're not
going to even attempt it.
This patch establishes a pool of repos in hgweb. When a request arrives,
we obtain the most recently used repository from the pool or create a
new one if none is available. When the request has finished, we put that
repo back in the pool.
We start with a pool size of 1. For servers using a single thread, the
pool will only ever be of size 1. For multi-threaded servers, the pool
size will grow to the max number of simultaneous requests the server
processes.
No logic for pruning the pool has been implemented. We assume server
operators either limit the number of threads to something they can
handle or restart the Mercurial process after a certain amount of
requests or time has passed.
hgweb contained code for determining whether a cached localrepository
instance was up to date. This code was way too low-level to be in
hgweb.
This functionality has been moved to a new "cachedlocalrepo" class
in hg.py. The code has been changed slightly to facilitate use
inside a class. hgweb has been refactored to use the new API.
As part of this refactor, hgweb.repo no longer exists! We're very close
to using a distinct repo instance per thread.
The new cache records state when it is created. This intelligence
prevents an extra localrepository from being created on the first
hgweb request. This is why some redundant output from test-extension.t
has gone away.
We perform some common tasks when a new repo instance is obtained. In
preparation for changing how we obtain repo instances, factor this
functionality into a standalone function.