When a set of revisions was empty, we were using an empty tuple. We now return an
empty frozenset to ensure the object could be used in an operation that requires a
set.
Mistakes were made while resolving rebase conflicts in 92753bcc8aa2. This led to
'date' being preserved in metadata when reading markers from a binary stream.
As a result, some known markers were seen as "new" when pulling. I noticed it
because a no-op pulls from main added about 600 duplicated markers to my
obsstore (for each pull).
I do not believe we need to perform any specific action to actively
de-duplicates existing obsstore. After this fix, duplicated markers will no be
propagated and the few affected repositories can probably deal with duplication (or
people can repull the obsstore from a clone).
As a side effect, we decode metadata only once, reducing the impact of the hack
in fm0 to store extra important data (parents and date).
This function returns the highest common version between the locally known
formats and a list of remotely known formats. This is going to be useful to
know what format should be used for exchange.
We can now read and write any known format. The list of known formats
currently has one element (0). The obsstore itself is not aware of
multiple formats yet and always uses format 0.
If we are to introduce new formats we need to be able call different
functions for different formats. Creating a function for format
version 0 is the first step.
This change is because these formats are used for version 0 of the
obsstore format. This is going to be useful in the future when we
introduce new versions of the format.
All variables involved in the obsstore format are prefixed with `_fm`.
`_fnodesize` was the exception. It is now back in line.
This is meaningful as we'll need to distinguish between multiple versions of the
obsstore format.
The mergemarkers function now returns the number of unknown markers in
the stream that have been added to the obsstore. This is similar to what
`obsstore.add` already does.
The method gains a docstring in the process.
Simple profiling of `hg log -r .` revealed ~18,000 calls to
mercurial.i18n.gettext() on the author's repository. The
culprit was 3 _() calls in util.parsedate() multiplied by
~6000 obsmarkers originating from the parsing of obsmarkers.
Changing the obsmarker code to parse the stored format of
dates instead of going through a generic path eliminates these
gettext() lookups and makes `hg log -r .` execute ~10% faster
on the author's repo. The performance gain is proportional to
the number of obsmarkers.
The author attempted to patch util.parsedate() to avoid the
gettext() lookups. However, that code is whacky and the author
is jet lagged, so the approach was not attempted.
We add a ``relevantmarkers`` method to fetch all markers that seem relevant to a
set of nodes. See function documentation about how this set is computed. This
will let us exchange only the markers that seem "relevant" to the set of
changesets related to a push or a pull.
The approach used to define "relevant" has been successfully tested in evolve
for 6 months.
We use the new `parents` field to build a dictionary of markers that touch
children of a node. This will be used to link prune markers to a set of
exchanged nodes.
We now store the `parents` field on disk. We use the same strategy as for
`date`: We stick it into the metadata. This is slow and dirty, but this is also
the only way we currently have.
At some point we'll have a new obsstore format to store this properly.
We are extracting the `date` information from the metadata at read time.
However, we failed to remove it from the metadata returned in the markers. This
is now fixed.
We need this information to build the set of relevant markers during
exchanges. This can only be done at the `createmarkers` level since
the `obsstore.create` function does not have a repo and therefore has
no access to the parent information.
This field is intended to store the parent of the precursor. This is useful to
attach pruned changesets to a set of exchanged changesets. We currently just
add the fields with a None value. None stands for "no data recorded".
The markers are now 5-item tuples (concluded by the date). The obsstore.fields
contents have been updated accordingly.
There is no change to the on-disk format yet, so the date has to be extracted
every time we read binary markers and re-injected each team we write them. This
introduces a slowdown that will be solved when a new version of the format is
added. Such a slowdown was already introduced by the evolve extension anyway.
We are going to increase the amount of data explicitly stored in obsolescence
markers. This mean we are going to have a longer tuple and some values will be
shuffled around. So we add a ``fields`` attribute to the obsstore class to
keep track of what entry is what. This will be useful for extensions and for
documentation purpose.
The date will become an official field in the markers (and ultimately
in the on-disk format). We start by making it an official argument for
the function.
We are about to add more arguments to this function (date, parents, etc).
Passing metadata as a keyword argument gives us more flexibility when adding
them.
We now have a function taking a list and marker and returning an encoded
version. This will allow obsolescence marker exchange experimentation to easily
pushkey-encode markers to be pushed after selection.
The obsstore method now have a return value. This informs caller about the
actual creation of a new markers. No new markers are created if it would have
been a duplicate.
The `metadata` argument only allow to specify metadata for all new markers. We
extension the format of the `relations` argument to support optional metadata
argument.
The first user of this should be the evolve extension who want to store parent
information of pruned changeset in extra (until we make a second version of the
format)
The `obsstore` class have a `create` method that create new obsolescence marker
from node. There is another function in the same module `createmarkers`. This
other function is higher level and automatically missing meta data (ultimately
calling the first one)
We add a new comment in the docstring of `obsstore.create` highlighting that
people writing new code probably want to use the top level one.
The obsolescence marker exchange code was already extracted during a previous
cycle. We are moving the extracted functio in this module. This function will
read and write data in the `pulloperation` object and I prefer to have all core
function collaborating through this object in the same place.
This changeset is pure code movement only. Code change for direct consumption of
the `pulloperation` object will come later.
The obsolescence marker exchange code was already extracted during a previous
cycle. We are moving the extracted functio in this module. This function will
read and write data in the `pushoperation` object and I prefer to have all core
function collaborating through this object in the same place.
This changeset is pure code movement only. Code change for direct consumption of
the `pushoperation` object will come later.
Reminder: a changeset is said "bumped" if it tries to obsolete a immutable
changeset.
The previous algorithm for computing bumped changeset was:
1) Get all public changesets
2) Find all they successors
3) Search for stuff that are eligible for being "bumped"
(mutable and non obsolete)
The entry size of this algorithm is `O(len(public))` which is mostly the same as
`O(len(repo))`. Even this this approach mean fewer obsolescence marker are
traveled, this is not very scalable.
The new algorithm is:
1) For each potential bumped changesets (non obsolete mutable)
2) iterate over precursors
3) if a precursors is public. changeset is bumped
We travel more obsolescence marker, but the entry size is much smaller since
the amount of potential bumped should remains mostly stable with time `O(1)`.
On some confidential gigantic repo this move bumped computation from 15.19s to
0.46s (×33 speedup…). On "smaller" repo (mercurial, cubicweb's review) no
significant gain were seen. The additional traversal of obsolescence marker is
probably probably counter balance the advantage of it.
Other optimisation could be done in the future (eg: sharing precursors cache
for divergence detection)
Detection of bumped changeset should use `allprecursors(<mutable>)` instead or
`allsuccessors(<immutable>)` so we need the all precursors function to exists.
Before this patch, duplicated obsolescence markers could slip into an
obstore if the bookmark was unknown locally and duplicated in the
incoming obsolescence stream.
Existing duplicate markers will not be automatically removed but
they'll stop propagating. Having a few duplicated markers is harmless
and people have been warned evolution is <blink>experimental</blink>
anyway.
Having a dedicated function will allow us to experiment with other exchange
strategies in an extension. As we have no solid clues about how to do it right,
being able to experiment is vital.
Some transaction tricks are necessary for pull. But nothing too scary.
Having a dedicated function will allows us to experiment with other exchange
strategies in an extension. As we have no solid clues about how to do it right,
being able to experiment is vital.
I intended a more ambitious extraction of push logic, but we are far too
advanced in the release cycle for it.
Obsolescence creates a sparse DAG mostly composed of a lot of small independent
chain of markers. Date is the only simple and "reliable" way to sort them. The
existence of a date is now enforced at creation time as I'm more and more
convinced that date will have a key role in obsolescence markers exchange.
In their current state, revset calls can be very costly, as we test
predicates on the entire repository.
This change drops revset calls in favor of direct testing of the phase
of changesets.
Performance test on my Mercurial checkout
- 19857 total changesets,
- 1584 obsolete changesets,
- 13310 obsolescence markers.
Before:
! extinct
! wall 0.015124
After:
! extinct
! wall 0.009424
Performance test on a Mozilla central checkout:
- 117293 total changesets,
- 1 obsolete changeset,
- 1 obsolescence marker.
Before:
! extinct
! wall 0.032844
After:
! extinct
! wall 0.000066
In their current state, revset calls can be very costly, as we test
predicates on the entire repository.
This change drops a revset call in favor of direct testing of the
phase of changesets.
Performance test on my Mercurial checkout
- 19857 total changesets,
- 1584 obsolete changesets,
- 13310 obsolescence markers.
Before:
! suspended
! wall 0.014319
After:
! suspended
! wall 0.009559
Performance test on a Mozilla central checkout:
- 117293 total changesets,
- 1 obsolete changeset,
- 1 obsolescence marker.
Before:
! suspended
! wall 0.033373
After:
! suspended
! wall 0.000053
In their current state, revset calls can be very costly, as we test
predicates on the entire repository.
This change drops revset call in favor of direct testing of the
phase of changesets.
Performance test on my Mercurial checkout
- 19857 total changesets,
- 1584 obsolete changesets,
- 13310 obsolescence markers.
Before:
! unstable
! wall 0.017366
After this changes:
! unstable
! wall 0.008093
Performance test on a Mozilla central checkout:
- 117293 total changesets,
- 1 obsolete changeset,
- 1 obsolescence marker.
Before:
! unstable
! wall 0.045190
After:
! unstable
! wall 0.000032
In their current state, revset calls can be very costly as we test
predicates on the entire repository. As obsolete computation is used
by the "hidden" filter, it needs to be very fast.
This changet drops the revset call in favor of direct testing of
the phase of a changeset.
Performance test on my Mercurial checkout
- 19857 total changesets,
- 1584 obsolete changesets,
- 13310 obsolescence markers.
Before:
! obsolete
! wall 0.047041
After:
! obsolete
! wall 0.004590
Performance test on a Mozilla central checkout:
- 117293 total changesets,
- 1 obsolete changeset,
- 1 obsolescence marker.
Before:
! obsolete
! wall 0.001539
After:
! obsolete
! wall 0.000017
Recomputing the filtered revisions at every access to changelog is far too
expensive. This changeset introduce a cache for this information. This cache is
hold by the repository (unfiltered repository) and invalidated when necessary.
This cache is not a protected attribute (leading _) because some logic that
invalidate it is not held by the local repo itself.
Divergent changeset are final successors (non obsolete) of a changeset who
compete with another set of final successors for this same changeset.
For example if you have two obsolescence markers A -> B and A -> C, B and C are
both "divergent" because they compete to be the one true successors of A.
Public revision can't be divergent.
This function is used and tested in the next changeset.
Successors set are an important part of obsolescence. It is necessary to detect
and solve divergence situation. This changeset add a core function to compute
them, a debug command to audit them and solid test on the concept.
Check function docstring for details about the concept.
All obsolescence related sets need to be computed on the full unfiltered
version of the repository, in particular because several of them
(obsolete, extinct) are used to compute the hidden revisions.
On a filtered repo, revset predicates related to these sets will be
properly filtered because of revset's own pre-filtering.
The first obsolescence flag is introduced to allows for fixing the "bumped"
changeset situation.
bumpedfix == 1.
Creator of new changesets intended to fix "bumped" situation should not forget
to add this flag to the marker. Otherwise the newly created changeset will be
bumped too. See inlined documentation for details.
The old name was not very good for two reasons:
- caller does not care about "cache",
- set of revision returned may not be obsolete at all.
The new name was suggested by Kevin Bullock.
People were confused by the fact `obstore.precursors` contained marker allowing
to find "precursors" and vice-versa.
This changeset changes their meaning to:
- precursors[x] -> set(markers on precursors edges of x)
- successors[x] -> set(markers on successors edges of x)
Some documentation is added to clarify the situation.
Nullid as successors create multiple issues:
- Nullid revnum is -1, confusing algorithm that use revnum unless you add
special handling in all of them.
- Nullid confuses "divergent" changeset detection and resolution. As you can't
add any successors to Nullid without being in even more troubles
Fortunately, there is no good reason to use nullid as a successor. The only
sensible meaning of "succeed by nullid" is "dropped" and this meaning is already
covered by obsolescence marker with empty successors set.
However, letting some nullid successors to slip in may cause terrible damage in
such algorithm difficult to debug. So I prefer to perform and clear detection of
of such pathological changeset. We could be much smarter by cleaning up nullid
successors on the fly but it would be much for expensive. As core Mercurial does
not create any such changeset, I think it is fine to just abort when suspicious
situation is detected.
Earlier experimental version created such changesets, so there are some out
there. The evolve extension added the necessary logic to clean up its mess.
This function is designed to be used by all code that creates new
obsolete markers in the local repository.
It is not used by debugobsolete because debugobsolete allows the
use of an unknown hash as argument.
This changeset introduces caches on the `obsstore` that keeps track of sets of
revisions meaningful for obsolescence related logics. For now they are:
- obsolete: changesets used as precursors (and not public),
- extinct: obsolete changesets with osbolete descendants only,
- unstable: non obsolete changesets with obsolete ancestors.
The cache is accessed using the `getobscache(repo, '<set-name>')` function which
builds the cache on demand. The `clearobscaches(repo)` function takes care of
clearing the caches if any.
Caches are cleared when one of these events happens:
- a new marker is added,
- a new changeset is added,
- some changesets are made public,
- some public changesets are demoted to draft or secret.
Declaration of more sets is made easy because we will have to handle at least
two other "troubles" (latecomer and conflicting).
Caches are now used by revset and changectx. It is usually not much more
expensive to compute the whole set than to check the property of a few elements.
The performance boost is welcome in case we apply obsolescence logic on a lot of
revisions. This makes the feature usable!
Obsolete markers wide-usage and propagation should be avoided by default until
the obsolete feature is more mature.
This changeset introduce the `_enable` variable and prevent the creation of
obsolete marker if the feature is set to `False` (the default).
More limitation comes in followup changesets.
Obsolete markers are now exchanged in smaller pieces that fit in a http header.
This changes is done to avoid 400 bad request error when exchanging obsolete
puskey over http.
The last key pushed is always hold by the "dump0" key to ensure an easy place to
hook for people who need it.
The function is now able to write the version header as necessary. The function
now yield bytes to be written to a stream.
This should ease later use of this function for wireprotocol based exchanged.
Prepare the public use of the writemarker by wireprotocol function.
This function yield every nodes which succeed to a group of nodes.
The first user will be checkheads who need to know if we push successors for
remote extra heads.
Marker are now written as soon as possible but within a transaction. Using a
transaction ensure a proper behavior on error and rollback compatibility.
Flush logic are not necessary anymore and are dropped from lock release.
With this changeset, the obsstore is open, written and closed for every single
added marker. This is expected to be highly inefficient and batched write should
be implemented "quickly".
Another issue is that every flush of the file will invalidate the obsstore
filecache and trigger a full re instantiation of the repo.obsstore attribute
(including, reading and parsing entry). This is also expected to be highly
inefficient and proper filecache operation should be implemented "quickly" too.
A side benefit of the filecache issue is that repo.obsstore object is properly
invalidated on transaction abortion.
This is the second step toward incremental writing of marker inside a
transaction. The obsstore file is now handled append only.
Header writing have been extracted from _writemarkers.
Because the _writemarkers method have been dropped, the push code
directly reuse the serialised content of local repo `listkeys`. This
is not very pretty, but this part of the protocol still need major
improvement anyway.
This is the first step toward incremental writing of obsolete marker within a
transaction.
For this purpose, obsstore is now given its repo sopener. This make it able to
handles read and write to the obsstore file itself. Most IO logic is removed
from localrepo and handled by obsstore object directly.
An `obsolete` boolean property is added to changeset context. Function to get
obsolete marker object from a changeset context are added to the obsolete
module.