We do not return the first successors we found, we returns the latest (non
obsolete (mostly)) one following the obsolete link transitively. We update the
documentation to make this clean.
The phrase "cannot edit immutable changeset" is kind of tautological.
Of course unchangeable things can't be changed. We instead mention
"public" and provide a hint so that we can point to the actual
problem. Even in cases where some operation other than edition cannot
be performed, "public" gives the root cause that results in the
"immutable" effect.
There is a precedent for saying "public" instead of "immutable", for
example, in `hg commit --amend`.
Because bundle2 allows a more precise exchange of obsmarkers during pull, it
sends them in a different order (previously unstable because of sets.) As
a result, they are added to the repository in a different order. To stabilize
the order and ensure tests are unchanged when moving from bundle1 to bundle2 we
sort markers when exchanging them.
In the long run, the obsstore will probably not use a linear storage.
Speed up the computation of the unstable revset by using the not public()
revset. In another series of patches, we optimize the not public() revset and
together it leads to a 50-100x speedup on the computation of unstable() for
our big repos.
Speed up the computation of the bumped revset by using the not public() revset.
In another patch series, we optimized the not public() revset. Together,
this cuts the cost of the computation of bumped() from 17% of the total time of
smartlog on our big repo to under 0.1% of the total time.
As for other currently in place cycle detection, arbitrarily cut the first
obsolescence link that create a cycle avoiding the infinite loop. This will have
to be made more deterministic in the future but we do not really care right now.
The precursors and children dictionaries are not used by many
commands. By making them lazily populated,
'hg log -r @~10::@ >/dev/null' is sped up from 0.564s to 0.440s on my
hg.hg repo with 73k markers.
Also make successors lazily populated, mostly for consistency with the
others.
We will soon delay populating precursors until we have to. To prepare
for that, make _checkinvalidmarkers() scan for a nullid precursor in
the list of new markers instead of the (currently) cheaper 'if nullid
in precursors' check.
In preparation for making the successors, precursors, and children
dictionaries lazily populated, break up _load() into one function for
adding markers to each dictionary.
This moves perfloadmarkers from
! result: 63866
! wall 0.239217 comb 0.250000 user 0.240000 sys 0.010000 (best of 42)
to
! result: 63866
! wall 0.218795 comb 0.210000 user 0.210000 sys 0.000000 (best of 46)
This moves perfloadmarkers on my linux workstation (63494 markers from
mpm, crew, and myself) performance from
! wall 0.357657 comb 0.360000 user 0.350000 sys 0.010000 (best of 28)
to
! wall 0.222345 comb 0.220000 user 0.210000 sys 0.010000 (best of 41)
which is a pretty good improvement.
On my BSD machine, which is ancient and slow, before:
! wall 3.584964 comb 3.578125 user 3.539062 sys 0.039062 (best of 3)
after:
! wall 2.267974 comb 2.265625 user 2.195312 sys 0.070312 (best of 5)
I feel like we could do better by moving the whole generator function
into C, but I didn't want to tackle that right away.
Some evolve user ignored the invalid markers for about two years and still have
some of them in some repository. This lead to plain abort whenever mercurial try
to open such repo. We need reinstall some way to clean this up in the evolve
extension. For this purpose, we need the checker code wrap-able independently.
This is scheduled for stable as this issue is blocking some evolve user.
We have two different types of node type (sha1 and sha256, only sha1 is used
now) and therefor different sizes for them. We now compute the value once
instead of redoing the computation every loop. This has no visible performance
impact.
Python garbage collection is triggered by container creation. So code that
creates a lot of tuples tends to trigger GC a lot. We disable the gc during
obsolescence marker parsing and associated initialization. This provides an
interesting speedup (25%).
Load marker function on my 58758 markers repo:
before: 0.468247 seconds
after: 0.344362 seconds
The benefit is a bit less visible overall. With python2.6 on my system I see:
after: 0.60
before: 0.53
The difference is probably explained by the delaying of a costly GC. (but there
is still a win). Marking involved tuples, lists and dicts as ignorable by the
garbage collector should give us more benefit. But this is another adventure.
Thanks goes to Siddharth Agarwal for the lead.
For python struct module, "d" is double. But for python string
formating, "d" is integer. We want to preserve the floating point
nature of the data, so we store it in the metadata as floating
point. We use "%r" to make sure we get as many significant digitis as
necessary to restore the float to the exact same value on the other
side.
The fm1 is transmitting the information as float. The lack of this made
fm1-stored markers not survive a round-trip to fm0 leading to duplicated
markers (or two markers very alike).
The basic obsolete option is allowing the creation of obsolete markers. This
does not enable other features, such as allowing unstable commits or exchanging
obsolete markers.
Previously, obstore read the obsolete._enabled flag to determine whether to
allow writes to the obstore. Since obsolete._enabled will be moving into a repo
specific config, we can't read it globally, and therefore must pass the
information into the constructor.
Previously, obsolete used the module level _enabled flag to determine whether it
was on or off. We need a bit more granular control, so we'll be introducing
toggle options. The isenabled() function is how you check if a particular option
is enabled for the given repository.
Future patches will add options such as 'createmarkers', 'allowunstable', and
'exchange' to enable various features of obsolete markers.
This option controls what version of the binary format to use when creating a new
obsstore file.
Default is still the old format. No safeguards are currently placed around the
option value, but no clueless users are in danger of harm since it is
undocumented.
This new encoding explicitly stores the date and parents allowing a
significantly faster marker decoding. See inline documentation for details.
This format is not yet used to store format on disk. But it will be used in
bundle2 exchange if both side support it. Support for on-disk format is coming
in other changesets.
We add flag to inform that the marker is using sha256 hashes. As format 0 is not
able to handle sha256 hashes (32 bytes long), we plain crash if we even attempt to
encode a sha256 with it.
Different formats will encode metadata in different ways. So we cannot keep the
binary blob in the object anymore. We use a tuple to ensure it is immutable and
hashable.
Recent scientific studies show that assigning a variable have no effect on
a unrelated constant. We therefore use the variable where we intended to in the
first place.
All contents includes documentation, constants, and functions, so we
gather all of those things into a single location. The diff is confusing
because it cannot understand what code is semantically moved around in this
grand migration.
When a set of revisions was empty, we were using an empty tuple. We now return an
empty frozenset to ensure the object could be used in an operation that requires a
set.
Mistakes were made while resolving rebase conflicts in 92753bcc8aa2. This led to
'date' being preserved in metadata when reading markers from a binary stream.
As a result, some known markers were seen as "new" when pulling. I noticed it
because a no-op pulls from main added about 600 duplicated markers to my
obsstore (for each pull).
I do not believe we need to perform any specific action to actively
de-duplicates existing obsstore. After this fix, duplicated markers will no be
propagated and the few affected repositories can probably deal with duplication (or
people can repull the obsstore from a clone).
As a side effect, we decode metadata only once, reducing the impact of the hack
in fm0 to store extra important data (parents and date).
This function returns the highest common version between the locally known
formats and a list of remotely known formats. This is going to be useful to
know what format should be used for exchange.
We can now read and write any known format. The list of known formats
currently has one element (0). The obsstore itself is not aware of
multiple formats yet and always uses format 0.
If we are to introduce new formats we need to be able call different
functions for different formats. Creating a function for format
version 0 is the first step.
This change is because these formats are used for version 0 of the
obsstore format. This is going to be useful in the future when we
introduce new versions of the format.
All variables involved in the obsstore format are prefixed with `_fm`.
`_fnodesize` was the exception. It is now back in line.
This is meaningful as we'll need to distinguish between multiple versions of the
obsstore format.
The mergemarkers function now returns the number of unknown markers in
the stream that have been added to the obsstore. This is similar to what
`obsstore.add` already does.
The method gains a docstring in the process.
Simple profiling of `hg log -r .` revealed ~18,000 calls to
mercurial.i18n.gettext() on the author's repository. The
culprit was 3 _() calls in util.parsedate() multiplied by
~6000 obsmarkers originating from the parsing of obsmarkers.
Changing the obsmarker code to parse the stored format of
dates instead of going through a generic path eliminates these
gettext() lookups and makes `hg log -r .` execute ~10% faster
on the author's repo. The performance gain is proportional to
the number of obsmarkers.
The author attempted to patch util.parsedate() to avoid the
gettext() lookups. However, that code is whacky and the author
is jet lagged, so the approach was not attempted.
We add a ``relevantmarkers`` method to fetch all markers that seem relevant to a
set of nodes. See function documentation about how this set is computed. This
will let us exchange only the markers that seem "relevant" to the set of
changesets related to a push or a pull.
The approach used to define "relevant" has been successfully tested in evolve
for 6 months.
We use the new `parents` field to build a dictionary of markers that touch
children of a node. This will be used to link prune markers to a set of
exchanged nodes.
We now store the `parents` field on disk. We use the same strategy as for
`date`: We stick it into the metadata. This is slow and dirty, but this is also
the only way we currently have.
At some point we'll have a new obsstore format to store this properly.
We are extracting the `date` information from the metadata at read time.
However, we failed to remove it from the metadata returned in the markers. This
is now fixed.
We need this information to build the set of relevant markers during
exchanges. This can only be done at the `createmarkers` level since
the `obsstore.create` function does not have a repo and therefore has
no access to the parent information.
This field is intended to store the parent of the precursor. This is useful to
attach pruned changesets to a set of exchanged changesets. We currently just
add the fields with a None value. None stands for "no data recorded".
The markers are now 5-item tuples (concluded by the date). The obsstore.fields
contents have been updated accordingly.
There is no change to the on-disk format yet, so the date has to be extracted
every time we read binary markers and re-injected each team we write them. This
introduces a slowdown that will be solved when a new version of the format is
added. Such a slowdown was already introduced by the evolve extension anyway.
We are going to increase the amount of data explicitly stored in obsolescence
markers. This mean we are going to have a longer tuple and some values will be
shuffled around. So we add a ``fields`` attribute to the obsstore class to
keep track of what entry is what. This will be useful for extensions and for
documentation purpose.
The date will become an official field in the markers (and ultimately
in the on-disk format). We start by making it an official argument for
the function.
We are about to add more arguments to this function (date, parents, etc).
Passing metadata as a keyword argument gives us more flexibility when adding
them.