digital-asset/daml - daml - gitea: Gitea Service

mirror of https://github.com/digital-asset/daml.git synced 2024-09-20 01:07:18 +03:00

Author	SHA1	Message	Date
Remy	28ab504b21	Bump test_sha in perf tests (#6825 ) CHANGELOG_BEGIN CHANGELOG_END	2020-07-22 11:00:53 +00:00
Moritz Kiefer	edd84a09d5	Fix reference to return produced by ApplicativeDo (#6821 ) * Fix reference to return produced by ApplicativeDo see https://github.com/digital-asset/ghc/pull/53 for details. fixes #6820 changelog_begin changelog_end * bump to merged commit changelog_begin changelog_end * switch to new ghc-lib changelog_begin changelog_end	2020-07-22 10:09:23 +00:00
Remy	d538d9a53e	Bump test_sha in perf tests (#6816 ) CHANGELOG_BEGIN CHANGELOG_END	2020-07-21 16:56:01 +00:00
Robert Autenrieth	7ce9748066	Split sandbox code into separate packages (#6695 ) * Move public code into daml-integration-api CHANGELOG_BEGIN [DAML Integration Kit]: Removed sandbox specific code from the API intended to be used by ledger integrations. Use the maven coordinates ``com.daml:participant-integration-api:VERSION`` instead of ``com.daml:ledger-api-server`` or ``com.daml:sandbox``. CHANGELOG_END	2020-07-17 17:06:06 +02:00
Moritz Kiefer	147a2700c0	Bump Windows cache (#6770 ) To “fix” the “output was not created” errors. changelog_begin changelog_end	2020-07-17 12:41:10 +02:00
Moritz Kiefer	52b9eabbcc	Revert "refactor ci jobs: add setvar to ci/lib.sh (#6708 )" (#6732 ) This reverts commit `61e9df3eaf`. This interacts very badly with the fact that we check out old commits for releases. While we could fix it for this particular issue, I don’t think this buys us enough to make this worth doing and it makes it easy to introduce issues in the future if we modify lib.sh changelog_begin changelog_end	2020-07-14 23:53:49 +02:00
Gary Verhaegen	61e9df3eaf	refactor ci jobs: add setvar to ci/lib.sh (#6708 ) CHANGELOG_BEGIN CHANGELOG_END	2020-07-13 17:34:54 +02:00
Moritz Kiefer	631ed3e891	Bump timeouts in compat tests (#6689 ) This bumps the timeout of the compat tests on PRs to 360 minutes matching other jobs on a PR (we mainly hit this if ghc-lib is rebuilt) and the timeout on the daily jobs to 720 minutes (we hit this if _everything_ is rebuilt). I am slightly worried about the timeout on the daily job. After having taken a look at it, there are a few reasons how we ended up here: 1. We started including more tests, e.g., sandbox-classic. Not much we can do here, those tests are useful. 2. We have a very large number of snapshots for 1.3.0. There are a few reasons for this: 1. Timing: We branched off early for the 1.2.0 release so the first snapshot for 1.3 was on June 3th. For 1.4 it looks like the first snapshot will be on July 15th so that’s roughly 2 extra snapshots just due to timing. 2. Additional snapshots: We had one broken snapshot due to a broken VSCode extension that we didn’t delete (probably not worth doing at this point). We also had to backport to an old snapshot which resulted in another extra snapshot. We also had one extra snapshot which was supposed to be the RC but wasn’t since the ANF revert needed to go in. The only thing that is clearly useless is the one broken snapshot but that doesn’t change things that much. I see 2 orthogonal options for improving this assuming we agree that the current runtime is worryingly high. 1. Prune snapshots more aggressively, e.g., only include the last 3 snapshots. That’s a pretty arbitrary decision but it would enforce a hard limit. 2. Reduce test combinations. E.g., only test snapshots vs stable releases but not snapshots vs snapshots. 3. We end up forcing a full build quite frequently. Here are just 2 examples of how we’ve done that so far. 1. Upgrade rules_haskell. Basically all tests are run by a Haskell binary so this forces a full rebuild. 2. Change runfiles of `daml`. I don’t think there is much we can do about 1 or 3 which leaves us with 2. One not entirely unreasonable option is to just do nothing. We did have periods where things went pretty smoothly for the most part and each month we reset to a much smaller number of releases (we also have to start throwing out old stable releases at some point). Otherwise reducing the number of test combinations seems the most promising option to me. changelog_begin changelog_end	2020-07-10 12:34:53 +00:00
Moritz Kiefer	6c0bbd3ba6	Bump test_sha in perf tests (#6649 ) This changed by the revert of the ANF changes which is harmless by the same reasoning that made bumping it harmless when we introduced it. changelog_begin changelog_end	2020-07-08 12:26:11 +00:00
Samir Talwar	89369b3bb9	CI: Increase the PostgreSQL connections from 100 to 200. (#6647 ) We saw a flake recently where PostgreSQL stopped accepting connections during a CI run, leading the build to fail. This increases the number of connections to 200 from the default of 100, hopefully mitigating issues such as this one. CHANGELOG_BEGIN CHANGELOG_END	2020-07-08 10:49:11 +00:00
Moritz Kiefer	ade99dd2c1	Reset windows cache (#6604 ) We are seeing caching errors again. changelog_begin changelog_end	2020-07-03 16:36:35 +00:00
nickchapman-da	14ca4e5e79	bump-perf (#6553 ) changelog_begin changelog_end	2020-06-30 22:08:36 +00:00
Gary Verhaegen	8539873d84	document shared memory segment issue (#6546 ) document shared memory segment issue After discussion with @SamirTalwar-DA, we agree the CI script to clear memory segments is a bit too dangerous to make it easy to run on developer machines. Still, developers may run into similar issues if they run lots of tests and/or do not reboot their laptop frequently. On developer laptops, we usually spawn one PostgreSQL instance per build/test that needs it (as opposed to CI where we create a single one for the entire build; see `build.sh`), so they can actually build up fairly quickly in some scenarios. As an alternative, I have added a section to the README to cover what to do if that issue happens. CHANGELOG_BEGIN CHANGELOG_END	2020-06-30 17:48:14 +02:00
Gary Verhaegen	beb33f2ab1	add explanation for clearing shared segments (#6545 ) As requested on #6530. CHANGELOG_BEGIN CHANGELOG_END	2020-06-30 13:21:32 +00:00
Gary Verhaegen	55776f92ba	clear shared memory segment on macOS (#6530 ) For a while now we've had errors along the line of ``` FATAL: could not create shared memory segment: No space left on device DETAIL: Failed system call was shmget(key=5432001, size=56, 03600). HINT: This error does not mean that you have run out of disk space. It occurs either if all available shared memory IDs have been taken, in which case you need to raise the SHMMNI parameter in your kernel, or because the system's overall limit for shared memory has been reached. The PostgreSQL documentation contains more information about shared memory configuration. child process exited with exit code 1 ``` on macOS CI nodes, which we were not able to reproduce locally. Today I managed to, sort of by accident, and that allowed me to dig a bit further. The root cause seems to be that PostgreSQL, as run by Bazel, does not always seem to properly unlink the shared memory segment it uses to communicate with itself. On my machine, running: ``` bazel test -t- --runs_per_test=100 //ledger/sandbox:conformance-test-wall-clock-postgresql ``` and eyealling the results of ``` watch ipcs -mcopt ``` I would say about one in three runs leaks its memory segment. After much googling and some head scratching trying to figure out the C APIs for managing shared memory segments on macOS, I kind of stumbled on a reference to `pcirm` in a comment to some low-ranking StackOverflow answer. It looks like it's working very well on my machine, even if I run it while a test (and therefore an instance of pg) is running. I believe this is because the command does not actually remove the shared memory segments, but simply marks them for removal once the last process stops using it. (At least that's what the manpage describes.) CHANGELOG_BEGIN CHANGELOG_END	2020-06-30 01:40:16 +02:00
Remy	f5c65696f7	Update LF Perf test SHA (#6510 ) CHANGELOG_BEGIN CHANGELOG_END	2020-06-26 12:11:50 +00:00
Shayne Fletcher	4d896bc3bd	Update ghc-lib, da-ghc-master-8.8.1 (#6460 ) changelog_begin changelog_end	2020-06-23 08:29:16 -04:00
Gary Verhaegen	7d3dae4b1f	update perf-sha (#6457 ) CHANGELOG_BEGIN CHANGELOG_END	2020-06-22 18:46:19 +02:00
Gary Verhaegen	2923048935	remove purge_old_agents (#6439 ) This script was supposed to remove old agents from the Azure Pipelines UI. It may have been useful at some time (notably, when we used ephemeral instances, they did not necessarily get to run their shutdown script), but as it stands now, it's broken. The output from that step ends in: ``` error: 2 derivations need to be built, but neither local builds ('--max-jobs') nor remote builds ('--builders') are enabled ``` after listing the nix packages it would build. Furthermore, it does not seem to be useful as I have not seen any spurious entry in the agents list on Azure since we switched to permanent nodes, on either the Linux or Windows side (and this would only run on Linux, if it ran). I'm also not convinced it ever ran, as I used to see a lot of spurious machines on both Linux and Windows when we did use ephemeral instances. CHANGELOG_BEGIN CHANGELOG_END	2020-06-20 17:37:24 +02:00
Shayne Fletcher	cec2693dc7	enable -Wunused-matches (#6423 ) changelog_begin changelog_end	2020-06-19 19:35:10 +00:00
Remy	149bfc89ff	Update LF Perf test SHA (#6416 ) CHANGELOG_BEGIN CHANGELOG_END	2020-06-18 14:27:26 +00:00
Moritz Kiefer	2c1d4cb805	Fix nix installation (#6400 ) Nix now requires -L, I’ve gone ahead and just normalized everything to use -sfL which we were already using in one place. changelog_begin changelog_end	2020-06-18 10:34:08 +02:00
Moritz Kiefer	7e0a684857	Bump Windows cache (#6383 ) changelog_begin changelog_end	2020-06-17 19:33:26 +02:00
Moritz Kiefer	a178f62613	Fix packaging performance (#6350 ) fixes #3150 This PR introduces a patch to GHC to fix the performance of the pattern match checker in the presence of multiple packages which is currently significantly (orders of magnitude) slower than having everything in a single package. I also added a test case that hits this. Here’s what you need to hit this issue: 1. A typeclass with a functional dependency. `HasField` is the obvious candidate for this. 2. A lot of instances of this typeclass in a separate package (this is the only part where the separate package matters). 3. A reasonably large ADT with a bunch of strict fields. 4. A pattern match in the context of some constraints of the typeclass. The constraints can be completely unused. In that case, you will get a significant slowdown in the number of instances, number of constructors and number of constraints (didn’t verify if it’s linear but it is significant which is all that matters). Here’s why this happens: 1. The pattern match checker checks for strict fields if the type is inhabited. 2. This calls `pmTopNormaliseType_maybe` to normalize a type (the details don’t matter) which in turn calls into the typechecker. This function is called very often (presumably linear in the number of constructors but didn’t verify.) 3. The typechecker has some logic in `improveFromInstEnv` for generating additional equations by unifying functional dependencies `a -> b` with constraints in scope and thereby deducing information about `b`. 4. In the pattern match checker the list of instances of the home package is empty since the pattern match checker (apparently) doesn’t actually care about those extra equations. However, the list of instances in the EPS is not empty. This is the issue here: By moving it to an external package we suddenly end up with thousands of instances that we try to unify with the functional dependencies every time we normalize which happens very often. Proposed fix: The solution is rather simple: Since the pattern match checker apparently does not care about the instances of the home package, it almost certainly doesn’t care about instances in general so we just empty the instances of external packages explicitly. Is the fix correct? 1. I verified that the GHC test suite passes with this patch which gives me a reasonable level of confidence. 2. I verified that our own test suite passes. 3. The most dodgy part is actually emptying the instance since the whole EPS stuff is a mutable mess. What could in theory happen is that the PM ends up loading an interface file that mutates this again. However, afaiu it is impossible for the PM to need an interface that the typechecker didnt already need. I did do a bunch of debugging and this is exactly what I observed in my experiments. Alternative ideas and upstreaming: The other option would be to not try and mess with the EPS but somehow have a conditional flag somewhere in the typechecker env to disable this logic in the pattern match checker. However, that sounds significantly more complex so I don’t think it’s worth the effort. GHC 8.10 has a new pattern match checker that has different performance characteristics and seems to do much better here so there is little reason to try and upstream this. I strongly want to avoid upgrading DAML to 8.10 at this point (too much risk, let’s wait until things calm down) changelog_begin - [DAML Compiler] Fix an issue where compilation slowed down significantly when code was split up into several packages. See https://github.com/digital-asset/daml/issues/3150 changelog_end	2020-06-16 15:12:34 +02:00
nickchapman-da	e19888d979	update for no stack-tracing in speedy perf (#6363 )	2020-06-16 11:36:05 +00:00
Gary Verhaegen	1300644668	fix error message on daily compat failure (#6337 ) When I changed the quoting for the success case as part of #6267, I forgot to update the error case, so now we don't get well-formed JSON for errors. CHANGELOG_BEGIN CHANGELOG_END	2020-06-14 22:52:57 +02:00
Andreas Herrmann	d1e422580a	Increment Windows cache URL (#6321 ) We've seen a series of failures of the form ``` ERROR: D:/a/1/s/daml-assistant/integration-tests/BUILD.bazel:162:1: output 'daml-assistant/integration-tests/create-daml-app-tests.exe' was not created ERROR: D:/a/1/s/daml-assistant/integration-tests/BUILD.bazel:162:1: not all outputs were created or valid ``` across multiple machines. We suspect cache poisoning as the cause. This increments the cache URL to effectively clear the cache. changelog_begin changelog_end Co-authored-by: Andreas Herrmann <andreas.herrmann@tweag.io>	2020-06-12 15:33:38 +02:00
Moritz Kiefer	7717574d00	Bump Windows cache (#6310 ) We are seeing ERROR: D:/a/2/s/compiler/scenario-service/protos/BUILD.bazel:67:1: output 'compiler/scenario-service/protos/_obj/scenario_service_haskell_proto/ScenarioService.o' was not created again so following our experiments, let’s reset the cache to see if it fixes anything. changelog_begin changelog_end	2020-06-11 16:26:31 +02:00
Gary Verhaegen	9c8c1fa909	lightly safer docs cron: fail instead of error (#6288 ) See @cocreature's comment on #6285. CHANGELOG_BEGIN CHANGELOG_END	2020-06-10 19:18:14 +02:00
Gary Verhaegen	485069f017	fix docs cron for releae notes (#6285 ) Thinking about the upcoming release, I realized our current docs cron has somehow lost the step of taking the release notes from the triggering commit, probably in all the back-and-forth about which release notes version to use to overwrite all the other ones. This restores that, and adapts the algorithm for the new, multi-line LATEST file format. This _should_ work for all the current history, including releases made on `release/*` branches and the unifying commit that turned the LATEST file multiline (it adds more than one line so won't be matched as a trigger commit). CHANGELOG_BEGIN CHANGELOG_END	2020-06-10 14:43:23 +02:00
Moritz Kiefer	20d26394e1	Modify the cache URL instead of relying on platform_suffix (#6273 ) For some reason, platform_suffix doesn’t seem to provide enough isolation to fix the “undeclared inclusion” errors even though it does fix the issues for me locally. This PR tries to address the problem by switching from `platform_suffix` to modifying the actual URL of the cache. To avoid leaking stuff from the local cache, I’ve added a clean --expunge for now. We should be able to remove this once nodes have been reset tomorrow. It will slow down nodes but that is clearly better than having everything fail. changelog_begin changelog_end	2020-06-09 17:05:19 +02:00
Moritz Kiefer	aac1e16794	Fix caching on Linux and MacOS (#6270 ) When bumping the cache url on Windows, I accidentally also changed the URL we push to on Linux and MacOS. This is obviously a bad idea so this PR fixes it. changelog_begin changelog_end	2020-06-09 08:08:06 +00:00
Gary Verhaegen	664df64e13	fix daily perf Slack notification (#6267 ) This PR fixes the Slack notification on daily perf runs. It also updates the perf sha. CHANGELOG_BEGIN CHANGELOG_END	2020-06-09 06:45:58 +00:00
Moritz Kiefer	1d3c8f3390	Bump cache suffix (#6265 ) * Bump cache suffix As discussed, we are going to bump this every time we feel like resetting the cache might help. This is a temporary measure to get some metrics on how often things break and if resetting the cache helps. changelog_begin changelog_end * Update configure-bazel as well changelog_begin changelog_end	2020-06-08 17:15:12 +02:00
Moritz Kiefer	f1822f6daa	Fix variable in daily slack notifications (#6221 ) Currently the report fails with variables[Build.SourceBranchName]: command not found which is obviously not what we want (it’s mixing up the syntax in Azure’s yaml config and Bash). Looking at the code in the tell-slack-failed.yml, this one does seem to work but I haven’t tested this so :crossed-fingers:. changelog_begin changelog_end	2020-06-04 12:41:36 +02:00
Gary Verhaegen	2fe320fe48	automated ghc-lib build (#6188 ) automated ghc-lib build This PR aims at automating the build of ghc-lib. The current process still has a few manual steps; it needs to be updated because Bintray is going away, so this seemed like a good opportunity to fully automate it. This works like the "patch bazel on Windows" jobs: the filename will contain a hash of the `ci/da-ghc-lib` folder, and the job will run only if the corresponding filename does not yet exist on the GCS bucket. PRs aiming at changing the ghc-lib version will need to run twice: once to create the artifacts, and once to change the `stack-snapshot.yaml` file to match. CHANGELOG_BEGIN CHANGELOG_END	2020-06-04 12:05:03 +02:00
Moritz Kiefer	b993339844	Include rules_haskell revision in platform suffix (#6209 ) * Include rules_haskell revision in platform suffix Hopefully this makes CI a bit less of a dumpsterfire. I’ve also followed the comment and made the suffix actually 3 characters long instead of 2 since that makes me worry less about collisions and should hopefully still be short enough to not hit MAX_PATH. changelog_begin changelog_end * Update ci/configure-bazel.sh Co-authored-by: Gary Verhaegen <gary.verhaegen@digitalasset.com> Co-authored-by: Gary Verhaegen <gary.verhaegen@digitalasset.com>	2020-06-03 21:33:37 +02:00
Gary Verhaegen	445f6467d9	daily run: warn on master only (#6177 ) Currently the message to Slack is always triggered by running the daily checks. This means that it gets very noisy to: 1. Run the check on PRs affecting the check (like this one), 2. Rerun the check multiple times to ascertain that a given failure is flaky. With this PR, the message to Slack is replaced with a simple `echo` when these checks are not run from the `master` branch, so whoever (manually) triggered them can still get feedback on the result, but other people don't get spurious `@here` mentions. CHANGELOG_BEGIN CHANGELOG_END	2020-06-03 16:36:05 +02:00
Moritz Kiefer	405f3ad6ee	Sort files when calculating CACHE_KEY (#6173 ) * Sort files when calculating CACHE_KEY The order returned by `find` is unspecified and seems to have changed for whatever reason in some cases. This changed the cache key which is obviously not intended. It looks like the one we currently have in our scoop manifest is the one that we get by sorting. Reversing the sort produces the one CI currently calculates. changelog_begin changelog_end * update manifest to match CI output Co-authored-by: Gary Verhaegen <gary.verhaegen@digitalasset.com>	2020-05-31 22:02:13 +02:00
Gary Verhaegen	90547e6ab4	build old docs with their release notes (#6128 ) In light of #6127, I kept wondering why rebuilding 1.1.1 would fail. The problem addressed by #6127 is that we tried to rebuild it, which we shouldn't, but the reason I noticed it is because the build failed, and there is no good reason for the 1.1.1 docs to not build anymore. Looking at the logs confused me even more as it failed with (elided): ``` docs/source/support/new-assistant.rst: WARNING: document isn't included in any toctree ``` and that change happened _after_ 1.1.1. So I went back to the code, and discovered I somehow had gotten confused as I changed the approach mid-way through editing the file. If we're overwriting the `release-notes.html` file post-build, which we are now doing (and is the reason for ignoring it when checking checksums), then we should not be touching the `release-notes.rst` file pre-build. CHANGELOG_BEGIN CHANGELOG_END	2020-05-27 22:19:18 +00:00
Gary Verhaegen	e2d416e335	fix docs cron not ignoring release-notes (#6127 ) The docs cron is supposed to ignore the release-notes.html page when checking whether a docs folder is corrupted, because we manually override it. However, that currently doesn't work, either because the `sed` version we are using does not support changing the delimiters, or because no version of `sed` does and I just imagined it. CHANGELOG_BEGIN CHANGELOG_END	2020-05-27 18:47:40 +02:00
Gary Verhaegen	ccb496ee0d	update perf test sha (#6125 ) Changed by #6123, relevant part of the diff is: ``` ledger.lookupGlobalContract(ParticipantView(committers.head), effectiveAt, acoid) match { - case LookupOk(_, result) => + case LookupOk(_, result, _) => cachedContract = cachedContract + (step -> result) ``` which seems benign enough. CHANGELOG_BEGIN CHANGELOG_END	2020-05-27 15:10:11 +00:00
Gary Verhaegen	6e48abc793	update perf benchmark following #6080 (#6120 ) This should be merged after #6080. This PR adds a patch (and consequently updates the `ci/cron/perf/compare.sh` script) to apply the same logical change as #6080 on top of the baseline commit, so our performance comparison remains "apples to apples". I am well aware that managing patches is not going to be a great way forward. The rate of changes on the benchmark seems to be slow enough that this is good enough for now, but should we change the benchmark more often and/or want to add new benchmarks, a better approach would be to handle the changes at the Scala level. That is: - Create a "rest of the world" (world = Speedy, its compiler, and all of the associated types) interface that benchmarks would depend on, rather than depend directly on the rest of the codebase. - Create two implementations of that interface, one that compiles against the current state of the world, and one that compiles against the baseline. - Change the script to load the relevant implementation, and then run all the benchmarks as-is, with no match necessary. CHANGELOG_BEGIN CHANGELOG_END	2020-05-27 13:34:08 +02:00
Gerolf Seitz	d55ebf08ec	Use Sandbox Classic as DAML on SQL (#6095 ) CHANGELOG_BEGIN CHANGELOG_END	2020-05-27 08:31:27 +02:00
Gary Verhaegen	9c7c8918a3	fix fatjar versions (#6091 ) Version is taken from the env var (or defaulted to 0.0.0) at build-time. Since those two packages are not build by default by Bazel, we need to add the env var to the Bash step where they do get explicitly built. Fixes #6090. CHANGELOG_BEGIN - sandbox and http-api fatjars will now display correct version number. CHANGELOG_END	2020-05-25 15:59:23 +02:00
nickchapman-da	fb6cafa311	Bump the sha for CI perf (#6078 ) changelog_begin changelog_end	2020-05-22 16:24:18 +00:00
Moritz Kiefer	629ec732dd	Include puppeteer tests in compat tests (#6018 ) * Include puppeteer tests in compat tests This PR adds the puppeteer based tests to the compatibility tests. This also means that they are now actually compatibility tests. Before, we only tested the SDK side. Apart from process management being a nightmare on Windows as usually, there are two things that might stick out here: 1. I’ve replaced the `sh_binary` wrapper by a `cc_binary`. There is a lengthy comment explaining why. I think at the moment, we could actually get rid of the wraper completely and add JAVA to path in the tests that need it but at least for now, I’d like to keep it until we are sure that we don’t need to add more to it (and then it’s also in the git history if we do need to resurrect it). 2. These tests are duplicated now similar to the `daml ledger ` tests. The reasoning here is different. They depend on the SDK tarball either way so performance wise there is no reason to keep them. However, we reference the other file in the docs which means we cannot change it freely. What we could do is to make this sufficiently flexible to handle both the `daml start` case and separate `daml sandbox`/`daml json-api` processes and then we can reference it in the docs. There is still added complexity for Windows but that’s necessary for users as well that want to run this on Windows so that seems unavoidable. (I should probably also remove my snarky comments 😇) I’d like to kee it duplicated for this PR and then we can clean it up afterwards. changelog_begin changelog_end Bump timeouts changelog_begin changelog_end	2020-05-22 14:02:59 +02:00
Gary Verhaegen	957a74c325	fix trailing newline in docs cron (#6053 ) CI currently errors with: ``` Subprocess: git checkout `efe6545c2c` -- docs/source/support/release-notes.rst failed with exit code 127; output: --- --- err: --- Previous HEAD position was 2af134c... WIP: Draft version constraint generation (#5472) HEAD is now at efe6545... 1.2.0-snapshot.20200520.4224.0.2af134ca (#6040) /bin/sh: 2: --: not found --- ``` because the line ``` latest_release_notes_sha <- shell "git log -n1 --format=%H HEAD -- LATEST" ``` will assign a string that ends in a newline, and then when we try to construct the shell command: ``` (shell_ $ "git checkout " <> latest_sha <> " -- docs/source/support/release-notes.rst") ``` we actually get two lines for Bash to execute. CHANGELOG_BEGIN CHANGELOG_END	2020-05-20 18:26:27 +02:00
Gary Verhaegen	94122ec561	fix docs cron (#6049 ) Current version yields: ``` Subprocess: git log -n1 --format=%H master -- LATEST failed with exit code 128; output: --- --- err: --- fatal: bad revision 'master' --- ``` so apparently we can't trust a CI run on master to have a master branch defined. `HEAD` should work, though. CHANGELOG_BEGIN CHANGELOG_END	2020-05-20 16:37:47 +02:00
Gary Verhaegen	fb6dc904a4	trigger all releases from master (#6016 ) trigger all releases from master The 1.1.0 release went wrong and we had to trash it and release 1.1.1 instead. This is an attempt at identifying and correcting the root cause behind that incident. To understand the situation, we need to know how releases worked before 1.0. We had a one-line file called `LATEST` that specifies the git SHA and version tag for the latest release. A change to that file triggered a release with the specified release tag, built from the source tree of the specified commit. The `LATEST` file looked something like: ``` `f050da78c9` 1.0.0-snapshot.20200411.3905.0.f050da78 ``` To mark a release as stable, we would change it to look like this: ``` `f050da78c9` 1.0.0 ``` i.e. simply drop the `-snapshot...` suffix. Even though the commit (and thus the entire source tree we build from) is the same, we would need to rebuild almost all of our release artifacts, as they embed the version tag in various places and ways. That worked well as long as we could assume we were doing trunk-based development, i.e. all releases would always come from the same (`master`) branch. When we released 1.0, and started work on 1.1, we had a few bug reports for 1.0 that we decided should be resolved in a point release. We decided that the best way to handle that would be to have a branch starting on the release commit for 1.0, and then backport patches from `master` to that branch. We adapted our build process to also watch the `release/1.0.x` branch and, in particular, trigger a new release build if the `LATEST` file in that branch changed. That worked well. The plan going forward was to keep doing regular snapshot releases from the `master` branch, and create support, point releases ("patch" releases in semver) from dedicated branches. On April 30, we made a snapshot release as an RC for 1.1.0, by changing the `LATEST` file in the `master` branch. That release was built on commit `681c862d`. On May 6, we decided to take a new snapshot as the RC for 1.1.0; we changed `LATEST` in `master` to designate `7e448d81` as the new latest release. On May 11, we noticed an issue that broke our builds. Without going into details, an external artifact we depend on had changed in incompatible ways. After fixing that on `master`, we reasoned that this would also break the build of the final 1.1.0 release if we just tried to build `7e448d81` again. But as the target release date was May 13, we did not want to take a new snapshot after that fix, as that would have included one more week of work in the release, and given us no time to test it. So we did what we did for the 1.0 branch, as it had worked well: we created a branch that forked from `master` at commit `7e448d81` and called it `release/1.1.x`, then cherry-picked the one fix to our build process to work around the broken download. When the time came to make the final 1.1.0 build on May 13, we naturally picked the `LATEST` file from the `release/1.1.x` branch and dropped the `-snapshot...` suffix. Importantly, we did not need to update the target commit to include the "broken download" fix as, in the meantime, the internet had fixed itself, and we thus reasoned we should go for the exact code of the RC rather than include an unnecessary, albeit seemingly harmless, change. Everything went well with the release process. Tests went well too. Then we got a report that an application that worked against the latest RC broke with the final 1.1.0. The issue was that we had built the wrong commit: by branching off at the point of the _target_ commit for the latest snapshot, we did not have the change to the `LATEST` file that designated that commit as the target. So the `LATEST` file in `release/1.1.x` was still pointing to `681c862d`. I believe the root cause for this issue is the fact that we have scattered our release process over multiple branches, meaning there is no linear history of what was released and we are relying on people being able to mentally manage multiple timelines. Therefore, I propose to fix our release process so this should not happen again by linearizing the release process, i.e. getting back to a situation where all releases are made from a single branch, `master`. Because we do want to be able to release _for_ multiple release branches (to provide backports and bugfixes), we still need some way to accommodate that. Having a single `LATEST` file in the same format as before would not really work well: keeping track of interleaved release streams on a single file would not really be easier than keeping track of multiple branches. My proposed solution is to instead have a multiline LATEST file, so that all the release branch "tips" can be observed at the same time, and, as long as we take care to only advance one release branch at a time, we can easily keep track of each of them. This is what this PR does. This required a few changes to our release process. Most notably: - Obviously, as this is the main point of this PR, the build process has once again been restricted to only trigger new releases from the `master` branch. - As our CI machinery cannot easily be made to produce multiple releases from a single build, the `check_for_release` step will only recognize a commit as a release trigger if it changes a single line in the `LATEST` file. This restriction comes in addition to the existing one that a release commit is only allowed to change either just the `LATEST` file or both the `LATEST` and `docs/source/support/release-notes.rst` files. - The docs publication process has been changed to update _all_ published versions to display the _latest_ release notes page. This means that the release notes page will always show you all published versions, regardless of which version of the documentation you're looking at. This also means that interleaving release notes correctly on that page is a manual exercise. - As per the intention of the new process, the `LATEST` file has been updated to contained all existing post-1.0 stable releases. It should also include all existing snapshot releases should we have more than one at a time (say, should we discover an issue with 1.1.1 that required us to work on a 1.1.2). - The `release.sh` script has been dramatically simplified as I felt it was trying to do too much and porting its existing functionality to a multi-line `LATEST` file would be too hard. CHANGELOG_BEGIN CHANGELOG_END	2020-05-19 19:18:10 +02:00

1 2 3 4

197 Commits