digital-asset/daml - daml - gitea: Gitea Service

mirror of https://github.com/digital-asset/daml.git synced 2024-11-10 10:46:11 +03:00

Author	SHA1	Message	Date
Moritz Kiefer	b53bb7bd53	Fix notify_user job for releases (#7096 ) Bash is hard and I hate shell scripts changelog_begin changelog_end	2020-08-12 08:48:19 +00:00
Gary Verhaegen	00f3de63c9	rotate responsibility for release process (#7011 ) This PR attempts to add some automation around assigning release management. The PR adds a file `release/rotation`; each week, the updated CI cron job will: - Open a PR for the new release [as current]. - Assign the first user in the file to that PR. - Add the Standard-Change label to the PR. - Start the build for that PR [as current]. - Open a new PR that rotates the `release/rotate` file, i.e. pushes back the first line to the end of the file. This PR also adds mentions of the "release handler" (the first line of `release/rotation`) to the various messages we send to Slack along the release process. The initial state of the `release/rotation` file has been created by listing all the volunteers (Language team, Application Runtime team, as well as @SamirTalwar-DA and @stefanobaghino-da) and piping the file through `shuf`. (Then I put myself at the top so I can hopefully iron out the issues with the first attempt.) CHANGELOG_BEGIN CHANGELOG_END	2020-08-05 18:58:56 +02:00
Gary Verhaegen	e4482767af	fix release job (#6959 ) This failed on [the master build for the latest release][0] (trigger commit `08d4bb0a21`), fortunately after everything was done so the only consequence is a red tick in the commit list. [0]: https://dev.azure.com/digitalasset/daml/_build/results?buildId=50840&view=logs&jobId=8d802004-fbbb-5f17-b73e-f23de0c1dec8&j=8d802004-fbbb-5f17-b73e-f23de0c1dec8&t=3f4756f8-86d7-526f-1a17-d1c7745ae68d CHANGELOG_BEGIN CHANGELOG_END	2020-08-03 12:35:51 +02:00
Moritz Kiefer	9ed9970493	Avoid dev-env for posting to slack (#6914 ) This is running on Azure’s agents and we just call curl so there is really no need for dev-env (as evidenced by the fact that the message got sent despite dev-env failing). changelog_begin changelog_end	2020-07-29 15:22:23 +02:00
Samir Talwar	11bc582a9e	Publish DAML on SQL to GitHub Releases. (#6876 )	2020-07-27 17:33:20 +02:00
Gary Verhaegen	8043756883	better release triggers (#6859 ) Based on feedback from @nickchapman-da, this PR aims at making the release process easier by: - Automatically opening a release PR on Wednesday morning. The goal here is that by the time we start working, there is a release already built, so we save about an hour on waiting for that. This obviously doesn't help with ad-hoc releases. - On a release PR build, posting to Slack when the release is ready to merge. - On a release master build, posting to Slack when a release is ready to be tested. My hope is that this makes the release process less tedious. This is not trying to address the actual release testing, but hopefully should reduce the annoyance of having to constantly go and check if the release is ready. CHANGELOG_BEGIN CHANGELOG_END	2020-07-24 16:40:11 +00:00
Gary Verhaegen	34ca4f7219	run tests on auto-compat pr (#6755 ) While closely following the 1.3 release through our pipeline to check that #6709 worked as expected, I realized that the automatically-created PR does not start the normal tests either, presumably because it's been opened by a bot. The bot doe shave write access to the repo (obviously, as it can create the PR in the first place), but somehow that doesn't seem to count as a PR with write access for Azure. So this PR adds the normal test run too, so we don't need to manually say `/azp run` on the PR. CHANGELOG_BEGIN CHANGELOG_END	2020-07-16 11:36:22 +00:00
Moritz Kiefer	da995886bb	Fi setvar call in compat update PR (#6734 ) The revert that removed the use of lib.sh broke this again. changelog_begin changelog_end	2020-07-15 10:42:16 +02:00
Moritz Kiefer	52b9eabbcc	Revert "refactor ci jobs: add setvar to ci/lib.sh (#6708 )" (#6732 ) This reverts commit `61e9df3eaf`. This interacts very badly with the fact that we check out old commits for releases. While we could fix it for this particular issue, I don’t think this buys us enough to make this worth doing and it makes it easy to introduce issues in the future if we modify lib.sh changelog_begin changelog_end	2020-07-14 23:53:49 +02:00
Gary Verhaegen	6ea1afc247	run full compat tests on compat tests update PR (#6709 ) This asks Azure to run a full compat test against the branch that updates the compat test matrix, so we know it's good when we merge it rather than the next morning. CHANGELOG_BEGIN CHANGELOG_END	2020-07-13 17:46:32 +02:00
Gary Verhaegen	61e9df3eaf	refactor ci jobs: add setvar to ci/lib.sh (#6708 ) CHANGELOG_BEGIN CHANGELOG_END	2020-07-13 17:34:54 +02:00
Gary Verhaegen	6366fb753d	fix auto compat update PR (#6706 ) This PR fixes a few things with the script that automatically updates the compat test matrix on release, based on its use over the past two weeks. Specifically: - Send an error message to Slack in case this job fails. Previously, failures here were silent. - Add an exponential backoff strategy to wait for the artifact to be available on Maven. Previously, the update script just failed. - Allow for rerunning on the same machine after failure by removing the branch if it exists. - Fix the commit message to include proper newlines instead of literal `\n`'s. CHANGELOG_BEGIN CHANGELOG_END	2020-07-13 15:52:50 +02:00
Moritz Kiefer	631ed3e891	Bump timeouts in compat tests (#6689 ) This bumps the timeout of the compat tests on PRs to 360 minutes matching other jobs on a PR (we mainly hit this if ghc-lib is rebuilt) and the timeout on the daily jobs to 720 minutes (we hit this if _everything_ is rebuilt). I am slightly worried about the timeout on the daily job. After having taken a look at it, there are a few reasons how we ended up here: 1. We started including more tests, e.g., sandbox-classic. Not much we can do here, those tests are useful. 2. We have a very large number of snapshots for 1.3.0. There are a few reasons for this: 1. Timing: We branched off early for the 1.2.0 release so the first snapshot for 1.3 was on June 3th. For 1.4 it looks like the first snapshot will be on July 15th so that’s roughly 2 extra snapshots just due to timing. 2. Additional snapshots: We had one broken snapshot due to a broken VSCode extension that we didn’t delete (probably not worth doing at this point). We also had to backport to an old snapshot which resulted in another extra snapshot. We also had one extra snapshot which was supposed to be the RC but wasn’t since the ANF revert needed to go in. The only thing that is clearly useless is the one broken snapshot but that doesn’t change things that much. I see 2 orthogonal options for improving this assuming we agree that the current runtime is worryingly high. 1. Prune snapshots more aggressively, e.g., only include the last 3 snapshots. That’s a pretty arbitrary decision but it would enforce a hard limit. 2. Reduce test combinations. E.g., only test snapshots vs stable releases but not snapshots vs snapshots. 3. We end up forcing a full build quite frequently. Here are just 2 examples of how we’ve done that so far. 1. Upgrade rules_haskell. Basically all tests are run by a Haskell binary so this forces a full rebuild. 2. Change runfiles of `daml`. I don’t think there is much we can do about 1 or 3 which leaves us with 2. One not entirely unreasonable option is to just do nothing. We did have periods where things went pretty smoothly for the most part and each month we reset to a much smaller number of releases (we also have to start throwing out old stable releases at some point). Otherwise reducing the number of test combinations seems the most promising option to me. changelog_begin changelog_end	2020-07-10 12:34:53 +00:00
Moritz Kiefer	1b533561b4	Only publish JSON API to GH releases (#6620 ) daml-on-sql isn’t quite ready changelog_begin changelog_end	2020-07-06 14:43:09 +00:00
Moritz Kiefer	bc3f485b9a	Update maven_install.json in compatibility tests (#6555 ) We take our own libraries from latest_stable_version which changed but we did not rerun pinning which meant that this did not get updated. changelog_begin changelog_end	2020-07-01 00:49:53 +00:00
Gary Verhaegen	beb33f2ab1	add explanation for clearing shared segments (#6545 ) As requested on #6530. CHANGELOG_BEGIN CHANGELOG_END	2020-06-30 13:21:32 +00:00
Gary Verhaegen	55776f92ba	clear shared memory segment on macOS (#6530 ) For a while now we've had errors along the line of ``` FATAL: could not create shared memory segment: No space left on device DETAIL: Failed system call was shmget(key=5432001, size=56, 03600). HINT: This error does not mean that you have run out of disk space. It occurs either if all available shared memory IDs have been taken, in which case you need to raise the SHMMNI parameter in your kernel, or because the system's overall limit for shared memory has been reached. The PostgreSQL documentation contains more information about shared memory configuration. child process exited with exit code 1 ``` on macOS CI nodes, which we were not able to reproduce locally. Today I managed to, sort of by accident, and that allowed me to dig a bit further. The root cause seems to be that PostgreSQL, as run by Bazel, does not always seem to properly unlink the shared memory segment it uses to communicate with itself. On my machine, running: ``` bazel test -t- --runs_per_test=100 //ledger/sandbox:conformance-test-wall-clock-postgresql ``` and eyealling the results of ``` watch ipcs -mcopt ``` I would say about one in three runs leaks its memory segment. After much googling and some head scratching trying to figure out the C APIs for managing shared memory segments on macOS, I kind of stumbled on a reference to `pcirm` in a comment to some low-ranking StackOverflow answer. It looks like it's working very well on my machine, even if I run it while a test (and therefore an instance of pg) is running. I believe this is because the command does not actually remove the shared memory segments, but simply marks them for removal once the last process stops using it. (At least that's what the manpage describes.) CHANGELOG_BEGIN CHANGELOG_END	2020-06-30 01:40:16 +02:00
Gary Verhaegen	c7ea0a8b08	automatically run update-versions on release (#6479 ) This PR adds an extra post-release job to CI that will run the [`compatiblity/update-versions.sh`][0] script and open a PR with the result. [0]: `cb82a8d6be/compatibility/update-versions.sh` CHANGELOG_BEGIN CHANGELOG_END	2020-06-24 17:02:12 +02:00
Moritz Kiefer	416a568cbd	Release daml-on-sql and JSON API to GH releases (#6397 ) fixes #6384 For now this keeps the GCP bucket as well. I would suggest to keep that for 1.3 and drop it in 1.4 but I don’t feel particularly strongly about this so I’m also happy to drop it now. changelog_begin - [SDK] The JSON API and DAML on SQL (sandbox-classic) are now published as fat JARs to github releases. The GCP bucket that contained the fat JARs will not receive releases > 1.3. changelog_end	2020-06-18 13:08:18 +02:00
Moritz Kiefer	2c1d4cb805	Fix nix installation (#6400 ) Nix now requires -L, I’ve gone ahead and just normalized everything to use -sfL which we were already using in one place. changelog_begin changelog_end	2020-06-18 10:34:08 +02:00
Gary Verhaegen	7735acb833	add sha256sums to releases (#6263 ) Based on discussion on #6258. CHANGELOG_BEGIN CHANGELOG_END	2020-06-08 14:19:04 +00:00
Gary Verhaegen	2fe320fe48	automated ghc-lib build (#6188 ) automated ghc-lib build This PR aims at automating the build of ghc-lib. The current process still has a few manual steps; it needs to be updated because Bintray is going away, so this seemed like a good opportunity to fully automate it. This works like the "patch bazel on Windows" jobs: the filename will contain a hash of the `ci/da-ghc-lib` folder, and the job will run only if the corresponding filename does not yet exist on the GCS bucket. PRs aiming at changing the ghc-lib version will need to run twice: once to create the artifacts, and once to change the `stack-snapshot.yaml` file to match. CHANGELOG_BEGIN CHANGELOG_END	2020-06-04 12:05:03 +02:00
Gary Verhaegen	595f1e278d	fix fat-jar publish (#6046 ) CHANGELOG_BEGIN CHANGELOG_END	2020-05-20 15:08:53 +02:00
Gary Verhaegen	4882327db5	fix release diff (#6042 ) CHANGELOG_BEGIN CHANGELOG_END	2020-05-20 11:58:33 +02:00
Gary Verhaegen	fb6dc904a4	trigger all releases from master (#6016 ) trigger all releases from master The 1.1.0 release went wrong and we had to trash it and release 1.1.1 instead. This is an attempt at identifying and correcting the root cause behind that incident. To understand the situation, we need to know how releases worked before 1.0. We had a one-line file called `LATEST` that specifies the git SHA and version tag for the latest release. A change to that file triggered a release with the specified release tag, built from the source tree of the specified commit. The `LATEST` file looked something like: ``` `f050da78c9` 1.0.0-snapshot.20200411.3905.0.f050da78 ``` To mark a release as stable, we would change it to look like this: ``` `f050da78c9` 1.0.0 ``` i.e. simply drop the `-snapshot...` suffix. Even though the commit (and thus the entire source tree we build from) is the same, we would need to rebuild almost all of our release artifacts, as they embed the version tag in various places and ways. That worked well as long as we could assume we were doing trunk-based development, i.e. all releases would always come from the same (`master`) branch. When we released 1.0, and started work on 1.1, we had a few bug reports for 1.0 that we decided should be resolved in a point release. We decided that the best way to handle that would be to have a branch starting on the release commit for 1.0, and then backport patches from `master` to that branch. We adapted our build process to also watch the `release/1.0.x` branch and, in particular, trigger a new release build if the `LATEST` file in that branch changed. That worked well. The plan going forward was to keep doing regular snapshot releases from the `master` branch, and create support, point releases ("patch" releases in semver) from dedicated branches. On April 30, we made a snapshot release as an RC for 1.1.0, by changing the `LATEST` file in the `master` branch. That release was built on commit `681c862d`. On May 6, we decided to take a new snapshot as the RC for 1.1.0; we changed `LATEST` in `master` to designate `7e448d81` as the new latest release. On May 11, we noticed an issue that broke our builds. Without going into details, an external artifact we depend on had changed in incompatible ways. After fixing that on `master`, we reasoned that this would also break the build of the final 1.1.0 release if we just tried to build `7e448d81` again. But as the target release date was May 13, we did not want to take a new snapshot after that fix, as that would have included one more week of work in the release, and given us no time to test it. So we did what we did for the 1.0 branch, as it had worked well: we created a branch that forked from `master` at commit `7e448d81` and called it `release/1.1.x`, then cherry-picked the one fix to our build process to work around the broken download. When the time came to make the final 1.1.0 build on May 13, we naturally picked the `LATEST` file from the `release/1.1.x` branch and dropped the `-snapshot...` suffix. Importantly, we did not need to update the target commit to include the "broken download" fix as, in the meantime, the internet had fixed itself, and we thus reasoned we should go for the exact code of the RC rather than include an unnecessary, albeit seemingly harmless, change. Everything went well with the release process. Tests went well too. Then we got a report that an application that worked against the latest RC broke with the final 1.1.0. The issue was that we had built the wrong commit: by branching off at the point of the _target_ commit for the latest snapshot, we did not have the change to the `LATEST` file that designated that commit as the target. So the `LATEST` file in `release/1.1.x` was still pointing to `681c862d`. I believe the root cause for this issue is the fact that we have scattered our release process over multiple branches, meaning there is no linear history of what was released and we are relying on people being able to mentally manage multiple timelines. Therefore, I propose to fix our release process so this should not happen again by linearizing the release process, i.e. getting back to a situation where all releases are made from a single branch, `master`. Because we do want to be able to release _for_ multiple release branches (to provide backports and bugfixes), we still need some way to accommodate that. Having a single `LATEST` file in the same format as before would not really work well: keeping track of interleaved release streams on a single file would not really be easier than keeping track of multiple branches. My proposed solution is to instead have a multiline LATEST file, so that all the release branch "tips" can be observed at the same time, and, as long as we take care to only advance one release branch at a time, we can easily keep track of each of them. This is what this PR does. This required a few changes to our release process. Most notably: - Obviously, as this is the main point of this PR, the build process has once again been restricted to only trigger new releases from the `master` branch. - As our CI machinery cannot easily be made to produce multiple releases from a single build, the `check_for_release` step will only recognize a commit as a release trigger if it changes a single line in the `LATEST` file. This restriction comes in addition to the existing one that a release commit is only allowed to change either just the `LATEST` file or both the `LATEST` and `docs/source/support/release-notes.rst` files. - The docs publication process has been changed to update _all_ published versions to display the _latest_ release notes page. This means that the release notes page will always show you all published versions, regardless of which version of the documentation you're looking at. This also means that interleaving release notes correctly on that page is a manual exercise. - As per the intention of the new process, the `LATEST` file has been updated to contained all existing post-1.0 stable releases. It should also include all existing snapshot releases should we have more than one at a time (say, should we discover an issue with 1.1.1 that required us to work on a 1.1.2). - The `release.sh` script has been dramatically simplified as I felt it was trying to do too much and porting its existing functionality to a multi-line `LATEST` file would be too hard. CHANGELOG_BEGIN CHANGELOG_END	2020-05-19 19:18:10 +02:00
Gary Verhaegen	af939a7ee4	provisional beta deployment for daml-on-sql (#6024 ) Note: this is beta-level software. See documentation for the precise guarantees this does and does not come with. (Documentation does not exist at the time of opening this PR, but should exist by the time the first version of this gets published.) CHANGELOG_BEGIN - We now publish Sandbox Next as an ALPHA standalone jar. - We now publish the HTTP JSON API as a standalone jar. CHANGELOG_END	2020-05-19 18:11:26 +02:00
Moritz Kiefer	294c881a2a	Fix standard change check (#5958 ) This check never triggered for changes to LATEST due to the trailing slash in `has_changed`. changelog_begin changelog_end	2020-05-13 13:58:43 +02:00
Moritz Kiefer	4916a28682	Include create-daml-app tests in compatibility tests (#5945 ) This is the first part of #5700 It adds tests that build create-daml-app using `daml build` and then run the codegen and build the UI. Contrary to our main tests these also run on Windows. This is actually reasonably simple by first building the typescript libraries on Linux and then downloading them on Windows. There are two parts that are still missing from the tests in the main workspace: 1. Building the extra feature. This should be fairly easy to add. 2. Running the pupeeter tests. At least MacOS and Linux should be reasonably easy. I don’t know what horrors Windows will throw at us. This step is what actually makes this a compatibility test. Currently it doesn’t actually launch Sandbox and the JSON API. Since this PR is already pretty large, I’d like to tackle those things separately. changelog_begin changelog_end	2020-05-13 10:39:51 +02:00
Gary Verhaegen	bda565fa44	patching Bazel on Windows (infra bits, no patch yet) (#5918 ) patch Bazel on Windows (ci setup) We have a weird, intermittent bug on Windows where Bazel gets into a broken state. To investigate, we need to patch Bazel to add more debug output than present in the official distribution. This PR adds the basic infrastructure we need to download the Bazel source code, apply a patch, compile it, and make that binary available to the rest of the build. This is for Windows only as we already have the ability to do similar things on Linux and macOS through Nix. This PR does not contain any intresting patch to Bazel, just the minimum that we can check we are actually using the patched version. CHANGELOG_BEGIN CHANGELOG_END	2020-05-12 23:16:04 +02:00
Gary Verhaegen	3899a59a11	switch back to hosted macOS nodes (#5935 ) CHANGELOG_BEGIN CHANGELOG_END	2020-05-11 22:59:33 +02:00
Gary Verhaegen	9b476416b8	switch back to Azure-provided macos nodes (#5920 ) This is temporary. It looks like the macOS nodes are dead; @nycnewman is looking into it, but in case he doesn't fix it in time, at least we have a backup plan so we're not completely blocked on Monday. CHANGELOG_BEGIN CHANGELOG_END	2020-05-11 09:28:40 +02:00
Gary Verhaegen	4a6ab84b69	add default machine capability (#5912 ) add default machine capability We semi-regularly need to do work that has the potential to disrupt a machine's local cache, rendering it broken for other streams of work. This can include upgrading nix, upgrading Bazel, debugging caching issues, or anything related to Windows. Right now we do not have any good solution for these situations. We can either not do those streams of work, or we can proceed with them and just accept that all other builds may get affected depending on which machine they get assigned to. Debugging broken nodes is particularly tricky as we do not have any way to force a build to run on a given node. This PR aims at providing a better alternative by (ab)using an Azure Pipelines feature called [capabilities](https://docs.microsoft.com/en-us/azure/devops/pipelines/agents/agents?view=azure-devops&tabs=browser#capabilities). The idea behind capabilities is that you assign a set of tags to a machine, and then a job can express its [demands](https://docs.microsoft.com/en-us/azure/devops/pipelines/process/demands?view=azure-devops&tabs=yaml), i.e. specify a set of tags machines need to have in order to run it. Support for this is fairly badly documented. We can gather from the documentation that a job can specify two things about a capability (through its `demands`): that a given tag exists, and that a given tag has an exact specified value. In particular, a job cannot specify that a capability should _not_ be present, meaning we cannot rely on, say, adding a "broken" tag to broken machines. Documentation on how to set capabilities for an agent is basically nonexistent, but [looking at the code](https://github.com/microsoft/azure-pipelines-agent/blob/master/src/Microsoft.VisualStudio.Services.Agent/Capabilities/UserCapabilitiesProvider.cs) indicates that they can be set by using a simple `key=value`-formatted text file, provided we can find the right place to put this file. This PR adds this file to our Linux, macOS and Windows node init scripts to define an `assignment` capability and adds a demand for a `default` value on each job. From then on, when we hit a case where we want a PR to run on a specific node, and to prevent other PRs from running on that node, we can manually override the capability from the Azure UI and update the demand in the relevant YAML file in the PR. CHANGELOG_BEGIN CHANGELOG_END	2020-05-09 18:21:42 +02:00
Gary Verhaegen	2b0c59f8af	detect cancellation in notify-user (#5895 ) This PR changes the notify_user job to not run when the job has been canceled, which happens mostly when we push new code. Not sure how I failed to see the `canceled` function in the past, but this does seem to do exactly what we want. CHANGELOG_BEGIN CHANGELOG_END	2020-05-07 19:35:45 +02:00
Gary Verhaegen	11a2fc3c2d	more flexible perf test check (#5891 ) This PR separates the "last known valid perf test" commit from the "baseline speedy implementation" commit. It is important for the perf test to be meaningful that the changes between those two commits are benign, say minor API adjustments, so that the perf measurement remains meaningful. This also adds a check on merging to master that tells Slack if the perf test has changed and the `test_sha` file needs updating. The Slack message is conditional on the current commit to avoid excessive noise. CHANGELOG_BEGIN CHANGELOG_END	2020-05-07 13:53:22 +02:00
Moritz Kiefer	b291e96ce1	Publish execution logs from Windows compatibility jobs (#5834 ) Hopefully, this helps diagnose the Windows CI failures. changelog_begin changelog_end	2020-05-05 12:23:11 +02:00
Gary Verhaegen	49b4a8dad8	tweak release process for more reliable labeling (#5823 ) Currently, there are quite a few releases that are lacking the Standard-Change label, even though they did publish artifacts. This makes our SOC2-compliance tracking a bit harder. For the past two months, I have manually added the label after-the-fact while preparing the monthly compliance report, but that doesn't seem like a great solution. This PR changes the release process to be more optimistic: assume the release is going to succeed by putting in the label immediately, and then (optionally) removing it if the release fails. Note that the label should only be removed in the rare case where the release was merged into master but somehow did not produce any artifact. This can only happen if the Linux build fails quite early, which as far as I know only happened once over the past two months when we had the release notes race condition. CHANGELOG_BEGIN CHANGELOG_END	2020-05-04 15:35:58 +02:00
Andreas Herrmann	4c99f67814	Publish Bazel logs (#5821 ) CHANGELOG_BEGIN CHANGELOG_END Co-authored-by: Andreas Herrmann <andreas.herrmann@tweag.io>	2020-05-04 11:14:40 +02:00
Gary Verhaegen	32fbf040aa	fail collect_build_data for windows_compat (#5779 ) At the moment, collect_build_data will wait for the Windows compatibility test to have "finished", but doesn't check its return status. This means two things: 1. Should the compatibility test end without a success or error (e.g. communication broken between Azure and the node), the option to rerun failed jobs will not appear, as there will be no failed job. 2. The subsequent notify_user step will ignore failures in the compatibility_windows job when reporting to Slack, making for confusing reports. CHANGELOG_BEGIN CHANGELOG_END	2020-04-30 15:03:03 +02:00
Moritz Kiefer	49e19ebed1	Make compat tests work on windows (#5732 ) * Make compat tests work on windows This required some changes to the daml_sdk rule since the read-only installation by the assistant breaks Bazel completely. We could only apply those changes on Windows but I think I prefer the consistency across platforms here over trying to stay close to how the SDK is installed on user machines given that the SDK installation is not something we’ve had issues with. I’ve excluded the postgresql tests for now. I don’t expect them to be particularly hard to fix but I’ve already spent almost 2 days on this and having some tests run on Windows seems like a clear improvement over running no tests on Windows :) changelog_begin changelog_end * Remove todo changelog_begin changelog_end	2020-04-28 16:06:36 +02:00
Gary Verhaegen	54d6782be3	drop v from release titles (#5742 ) This is a minor, cosmetic change. Note that all our references to releases are based on tags, and do not depend on the release title. This is evidenced by the fairly random titles we used to have before the title was set by CI, see e.g. [0.13.35](https://github.com/digital-asset/daml/releases/tag/v0.13.53). CHANGELOG_BEGIN CHANGELOG_END	2020-04-28 11:51:16 +02:00
Gary Verhaegen	7ceda5678a	run compatibility tests on macos (#5723 ) This PR extends the existing Linux compatibility tests to run on macOS too. Fixes #5692. CHANGELOG_BEGIN CHANGELOG_END Co-authored-by: Moritz Kiefer <moritz.kiefer@purelyfunctional.org>	2020-04-27 14:55:16 +02:00
Moritz Kiefer	0d1f21e4a2	Extend compatibility tests to test against HEAD (#5714 ) fixes #5691 changelog_begin changelog_end	2020-04-24 14:43:35 +02:00
Moritz Kiefer	df16cf7094	Extend compatibility tests to DAML on SQL (#5705 ) * Extend compatibility tests to DAML on SQL This feels a bit hacky since the runfiles don’t work quite like I would expect them to but it’s at least not more hacky than what we do for the head-based tests we currently have. Progress towards #5695 changelog_begin changelog_end * Fix runfiles with more bash changelog_begin changelog_end * remove redundant port options changelog_begin changelog_end * Create fewer sandbox targets changelog_begin changelog_end * Apply suggestions from code review Co-Authored-By: Andreas Herrmann <42969706+aherrmann-da@users.noreply.github.com> * Fix runfiles snippet changelog_begin changelog_end Co-authored-by: Andreas Herrmann <42969706+aherrmann-da@users.noreply.github.com>	2020-04-24 12:08:32 +02:00
Moritz Kiefer	7d36402412	Initial boilerplate for cross-version compatibility testing (#5665 ) This is a first step towards testing cross-version compatibility. It doesn’t actuall do much yet but hopefully it should be easier to parallelize once we have the initial boilerplate in place so ideally I’d like to address most missing things and issues in separate PRs. changelog_begin changelog_end	2020-04-23 12:58:11 +02:00
Gary Verhaegen	88c389c17a	enable patch releases (fix) (#5634 ) This is applying, on `master`, the same patch as #5605 applied on `release/1.0.x`. CHANGELOG_BEGIN CHANGELOG_END	2020-04-20 17:01:08 +02:00
Gary Verhaegen	7fb0c8c3ac	fix check_standard_change_label (#5600 ) As currently written, the check will compare the current commit with the base commit the branch was forked from. The intention was to only list the changes from the current PR, and to make the check work for PRs against release branches (the previous version of the heck always compared to master, regardless of what branch the PR was targeting). However, this does not work as expected because the "current commit" is not the tip of the PR, but the merge commit supplied by GitHub. Therefore, the diff here will include not only the changes in this PR, but also all the changes that happened on the target branch since forking. This is not an issue if the PR is properly rebased, but that's hard to control in a world where other people work too. This PR corrects this by explicitly computing the diff between the fork point on the target branch and the tip of the PR, ignoring the currently-checked-out commit. CHANGELOG_BEGIN CHANGELOG_END	2020-04-17 13:54:18 +02:00
Gary Verhaegen	a1fab2d9af	enable patch releases (#5584 ) This commit aims at enabling future patch releases; it is the master-branch equivalent of #5569 (applied to the 1.0 release branch). The only change between the two changelogs should be that this one also changes the docs cron so it can find the trigger commits for patch releases. CHANGELOG_BEGIN CHANGELOG_END	2020-04-16 17:50:55 +02:00
Gary Verhaegen	466fe1b518	switch to home-hosted macos nodes (#5543 ) CHANGELOG_BEGIN CHANGELOG_END	2020-04-14 18:18:30 +02:00
Gary Verhaegen	033b798009	cleanup collect_build_data job (#5548 ) I recently noticed that the `check_for_release` and `check_standard_change_label` jobs do not currently report their runtime, so including them in the build data is a bit moot (we always get `""` for all three values). Given that they usually run in under 3 seconds, I've decided the best way to fix this is to remove them from the build data, rather than add the required steps to collect their build times. CHANGELOG_BEGIN CHANGELOG_END	2020-04-14 15:33:08 +00:00
Gary Verhaegen	7d55095f17	reenable collect_build_data and notify_user (#5545 ) Looking at the behaviour of `succeededOrFailed`, it looks like it does not do what we want at all: both steps now only run on failures. My current hypothesis is that `write_ledger_dump` being skipped switches the state of the last job to something that is neither success nor failure. It would be really nice if Azure had a way to detect cancellation. :( CHANGELOG_BEGIN CHANGELOG_END	2020-04-14 11:29:12 +00:00
Gary Verhaegen	1780466330	cleanup collect_build_data & notify_user steps (#5491 ) Over the past three weeks or so, I have not seen a single case where the "get commit" step failed erroneously, i.e. all failures were genuine pushes, which we don't care about. Therefore, I hacve decided to remove the `tell_gary` code, as it is long, a bit hairy, and duplicated. In the meantime I've also discovered there actually is a way to tell Azure not to run these steps on a canceled build, which I believe makes sense, so I've added that. CHANGELOG_BEGIN CHANGELOG_END	2020-04-08 17:13:36 +02:00
Gary Verhaegen	8261af86d2	correct commit title in Slack msg (#5471 ) Currently, on a release commit on master, if the commit fails, we get the message from the target PR, which is confusing. This should (hopefully; it's a bit hard to test as it would require setting up a release PR that succeeds but fails on master) get us the title of the release commit, which hopefully will be less confusing. CHANGELOG_BEGIN CHANGELOG_END	2020-04-08 13:01:42 +02:00
Moritz Kiefer	be3d8bc301	Publish protobuf zip to github releases (#5418 ) I cannot test this without actually making a release but it is all copy-pasted from other targets so hopefully it works. changelog_begin changelog_end	2020-04-03 15:36:41 +02:00
Gary Verhaegen	1872c668a5	replace DAML Authors with DA in copyright headers (#5228 ) Change requested by Manoj. CHANGELOG_BEGIN CHANGELOG_END	2020-03-27 01:26:10 +01:00
Moritz Kiefer	0ed3bbf2ce	Bump nix version (#4934 ) Using the newest seems like a good idea and the previous one has network errors changelog_begin changelog_end	2020-03-11 12:26:29 +00:00
Gary Verhaegen	ef931e0b72	skip testing release script after making a release (#4911 ) Currently, on Linux, after the normal build, we try running the release script (in "dry run" mode). This is to check that the release script not only compiles, but actually runs. To be honest I'm not entirely sure why we do that as a separate step (i.e. why does `bazel test //...` not give us confidence about this script?), but the point of this PR is that, while there may be some benefit in running this script on normal PRs to check that we have not broken the release step, there is absolutely no point in running it _on a release build_, i.e. right after we've used the same script in "real" ("wet run"? 🤔) mode. CHANGELOG_BEGIN CHANGELOG_END	2020-03-10 12:45:14 +01:00
Gary Verhaegen	41643315ac	hopefully fix Azure pipelines github tag release (#4912 ) Maybe. If I'm reading [that](https://docs.microsoft.com/en-us/azure/devops/pipelines/tasks/utility/github-release?view=azure-devops) right. CHANGELOG_BEGIN CHANGELOG_END	2020-03-09 19:39:08 +01:00
Moritz Kiefer	52bf9b2a5c	Fix git tag of releases (#4862 ) Previously, we tagged the commit that made the release instead of the commit we are building the release off. changelog_begin changelog_end	2020-03-06 10:54:07 +01:00
Gary Verhaegen	950d8c3501	remove perf tests from CI (#4851 ) CHANGELOG_BEGIN CHANGELOG_END	2020-03-05 17:29:28 +01:00
Gary Verhaegen	3a7dca3286	remove exponential backoff for getting sha (#4748 ) This has been running for a few days now and while I have seen a bunch of these cases, I have not once received a message with a BACKOFF value different from 512. This means that, likely due to some sort of internal caching in Azure, retrying in this case is useless and just makes the build failure take more time, i.e. more time before we can rerun. Rerunning does usually solve it, though. I have also noticed that we still get these notifications when the job has been canceled, which usually means the user has force-pushed (in which case it makes sense that the commit is no longer available). I'm not sure we can detect this, but I take this opportunity to print the JobStatus just in case. CHANGELOG_BEGIN CHANGELOG_END	2020-02-27 15:14:21 +01:00
Gary Verhaegen	86bce50b9a	fix passing is_release through (#4745 ) Somehow, in the current setup, the publish steps do not get executed on master. This is what Azure reports: ``` Evaluating: and(succeeded(), eq('$(is_release)', 'true'), eq(variables['Build.SourceBranchName'], 'master'), eq('linux', 'linux')) Expanded: and(True, eq('$(is_release)', 'true'), eq(variables['Build.SourceBranchName'], 'master'), eq('linux', 'linux')) Result: False ``` So it looks like, in the condition, `${{parameters.is_release}}` evaluates to the literal string `$(is_release)`. If we look at the point of invocation of the ~function~ template, we can see: ``` - template: ci/build-unix.yml parameters: release_tag: $(release_tag) name: 'linux' is_release: $(is_release) ``` so it does not seem completely crazy. However, according to the documentation, we should expect that to be replaced by the value of the corresponding variable, as per: ``` variables: release_sha: $[ dependencies.check_for_release.outputs['out.release_sha'] ] release_tag: $[ coalesce(dependencies.check_for_release.outputs['out.release_tag'], '0.0.0') ] trigger_sha: $[ dependencies.check_for_release.outputs['out.trigger_sha'] ] is_release: $[ dependencies.check_for_release.outputs['out.is_release'] ] ``` What's interesting here is that, within `build-unix.yml`, we are also using `release_tag` in the exact same way: ``` - bash: ./build.sh "_$(uname)" displayName: 'Build' env: DAML_SDK_RELEASE_VERSION: ${{parameters.release_tag}} ``` and this time output from the build seems to show the value being correctly substituted: ``` damlc - Compiler and IDE backend for the Digital Asset Modelling Language SDK Version: 0.13.55-snapshot.20200226.3266.d58bb459 Usage: <interactive> COMMAND Invoke the DAML compiler. Use -h for help. ``` My current guess is that the (undocumented, as far as I can tell) evaluation order is as follows: 1. In the template, syntactically replace all the parameters. 2. In the job definition, replace the call to the template with the code of the template. So it is as if we had written the template directly in the `azure-pipelines.yml` file, with `$(release_tag)` and `$(is_release)`. 3. Run the build. When we reach the time to run this specific job, we can evaluate the expressions for the variables and replace them in the rest of the job. So what is going wrong? I believe the issue is with the quotes, preventing the substitution of `is_release`. They came directly from the [documented syntax](https://docs.microsoft.com/en-us/azure/devops/pipelines/process/conditions?view=azure-devops&tabs=yaml#use-a-template-parameter-as-part-of-a-condition), but if the above evaluation order is correct, they should not be there. There are actually two things going wrong here. The first one is that the syntax `$()` is used to substitute a value in what Azure considers a string. This is the case for `env` keys. However, the `condition` key is not a string, it is an Azure "expression". Expressions have their own evaluation rules and syntax, and in particular, `$()` is not a substitution rule there, so when it sees `$()` in a string in an expression (due to the quoptes), it leaves it alone. Removing the quotes does not directly help, though, as we then end with ``` condition: eq($(is_release), 'true') ``` and `$()` is not valid syntax in an expression. The way to use variables in an expression is `variables.name` (or `variables["name"]`, because why have only one?). So that means we have to pass variables to the template in different ways depending on how they will be used. So much fun. CHANGELOG_BEGIN CHANGELOG_END	2020-02-27 14:33:20 +01:00
Gary Verhaegen	1c2b921c14	fix tarball generation (#4738 ) The existing approach is a historical accident. The reason for the additional tarball/install jobs was that, in my original attempt, the build steps would still build the current commit, as opposed to the target commit. This is not such an issue on Linux, but setting up the build environment on Windows and macOS _again_ for no good reason is a pure waste of time (and effort in getting it right). Now that the build steps build the target commit (with the env var set), we can go back to the way things were previously: just take the build products directly from the build step. CHANGELOG_BEGIN CHANGELOG_END	2020-02-27 10:48:38 +01:00
Moritz Kiefer	1bab818dea	Prefix release jobs with 'release_' (#4737 ) changelog_begin changelog_end	2020-02-26 20:46:45 +00:00
Moritz Kiefer	e3a1f684ca	Remove Azure nix cache (#4734 ) This doesn’t seem to bring any benefit anymore on cache hits with the reduced closure size and on cache misses it’s significantly slower. changelog_begin changelog_end	2020-02-26 19:43:50 +00:00
Gary Verhaegen	1a1ad793ff	fix tarball steps (#4731 ) Turns out there were two required environment variables here. CHANGELOG_BEGIN CHANGELOG_END	2020-02-26 19:34:27 +01:00
Gerolf Seitz	d617922618	Remove InMemoryKVParticipantState (#4674 ) This removes the sample/reference implementation of kvutils InMemoryKVParticipantState. This used to be the only implementation of kvutils, but now with the simplified kvutils api we have ledger-on-memory and ledger-on-sql. InMemoryKVParticipantState was also used for the ledger dump utility, which now uses ledger-on-memory. * Runner now supports a multi participant configuration This change removes the "extra participants" config and goes for consistent participant setup with --participant. * Run all conformance tests in the repository in verbose mode. This means we'll print stack traces on error, which should make it easier to figure out what's going on with flaky tests on CI. This doesn't change the default for other users of the ledger-api-test-tool; we just add the flag for: - ledger-api-test-tool-on-canton - ledger-on-memory - ledger-on-sql - sandbox Fixes #4225. CHANGELOG_BEGIN CHANGELOG_END	2020-02-26 15:45:35 +01:00
Gary Verhaegen	48f1d9ed87	only run release if build succeeded (#4723 ) The default behaviour of an Azure job that has a dependency is to only run if the dependency has succeeded. However, that default behaviour is overridden if there is an exmplicit `condition` attribute. This PR restores the expected behaviour that we only try to build a release tarball if the actual build has succeeded. CHANGELOG_BEGIN CHANGELOG_END	2020-02-26 14:19:58 +00:00
Gary Verhaegen	616b5234fb	fix tarball scripts (#4721 ) The Bazel configuration step requires the IS_FORK variable to be set. CHANGELOG_BEGIN CHANGELOG_END	2020-02-26 13:47:03 +00:00
Gary Verhaegen	739dc38660	fix release steps (#4713 ) While flailing about randomly trying to reset the Windows cache yesterday, I noticed a couple issues with the current script: - The `fork_point` calculation is just plain broken. Somehow our `set -euo pipefail` does not fail on subshell errors, but the existing command is just never going to work: it looks like `git` does not resolve refs on the `merge-base` command. It also looks like the `--fork-point` option is not what we want. I don't know how this happened. - `sort` on my machine and on CI do not seem to behave the same with respect to upper/lower case ordering. To make the script independent of the specific sort order on the machine (probably controlled by the locale), we now sort both the actual and the expected list. Finally, based on the failure to recognize a release commit once merged into master, I realized that of course computing the diff between a commit and itself will yield an empty diff. The `git_sha` step will now identify the "master" and "fork point" commits as the parent for a master build. CHANGELOG_BEGIN CHANGELOG_END	2020-02-26 12:13:02 +01:00
Gary Verhaegen	3a5c777b35	fix checkout release step (#4701 ) In the current setup, we expose HEAD as the trigger commit, but that is the merge commit with master. Since making a release takes a long time, this merge commit is likely to not exist anymore by the time we want to try a rerun (assuming a flaky build). Release PRs are by definition (in the new system) independent of what's going on on master, so we should instead take the branch commit here when running on a PR. CHANGELOG_BEGIN CHANGELOG_END	2020-02-25 21:52:11 +01:00
Gary Verhaegen	5a117dc358	introduce new release process (#4513 ) Context ======= After multiple discussions about our current release schedule and process, we've come to the conclusion that we need to be able to make a distinction between technical snapshots and marketing releases. In other words, we need to be able to create a bundle for early adopters to test without making it an officially-supported version, and without necessarily implying everyone should go through the trouble of upgrading. The underlying goal is to have less frequent but more stable "official" releases. This PR is a proposal for a new release process designed under the following constraints: - Reuse as much as possible of the existing infrastructure, to minimize effort but also chances of disruptions. - Have the ability to create "snapshot"/"nightly"/... releases that are not meant for general public consumption, but can still be used by savvy users without jumping through too many extra hoops (ideally just swapping in a slightly-weirder version string). - Have the ability to promote an existing snapshot release to "official" release status, with as few changes as possible in-between, so we can be confident that the official release is what we tested as a prerelease. - Have as much of the release pipeline shared between the two types of releases, to avoid discovering non-transient problems while trying to promote a snapshot to an official release. - Triggerring a release should still be done through a PR, so we can keep the same approval process for SOC2 auditability. The gist of this proposal is to replace the current `VERSION` file with a `LATEST` file, which would have the following format: ``` ef5d32b7438e481de0235c5538aedab419682388 0.13.53-alpha.20200214.3025.ef5d32b7 ``` This file would be maintained with a script to reduce manual labor in producing the version string. Other than that, the process will be largely the same, with releases triggered by changes to this `LATEST` and the release notes files. Version numbers =============== Because one of the goals is to reduce the velocity of our published version numbers, we need a different version scheme for our snapshot releases. Fortunately, most version schemes have some support for that; unfortunately, the SDK sits at the intersection of three different version schemes that have made incompatible choices. Without going into too much detail: - Semantic versioning (which we chose as the version format for the SDK version number) allows for "prerelease" version numbers as well as "metadata"; an example of a complete version string would be `1.2.3-nightly.201+server12.43`. The "main" part of the version string always has to have 3 numbers separated by dots; the "prerelease" (after the `-` but before the `+`) and the "metadata" (after the `+`) parts are optional and, if present, must consist of one or more segments separated by dots, where a segment can be either a number or an alphanumeric string. In terms of ordering, metadata is irrelevant and any version with a prerelease string is before the corresponding "main" version string alone. Amongst prereleases, segments are compared in order with purely numeric ones compared as numbers and mixed ones compared lexicographically. So 1.2.3 is more recent than 1.2.3-1, which is itself less recent than 1.2.3-2. - Maven version strings are any number of segments separated by a `.`, a `-`, or a transition between a number and a letter. Version strings are compared element-wise, with numeric segments being compared as numbers. Alphabetic segments are treated specially if they happen to be one of a handful of magic words (such as "alpha", "beta" or "snapshot" for example) which count as "qualifiers"; a version string with a qualifier is "before" its prefix (`1.2.3` is before `1.2.3-alpha.3`, which is the same as `1.2.3-alpha3` or `1.2.3-alpha-3`), and there is a special ordering amongst qualifiers. Other alphabetic segments are compared alphabetically and count as being "after" their prefix (`1.2.3-really-final-this-time` counts as being released after `1.2.3`). - GHC package numbers are comprised of any number of numeric segments separated by `.`, plus an optional (though deprecated) alphanumeric "version tag" separated by a `-`. I could not find any official documentation on ordering for the version tag; numeric segments are compared as numbers. - npm uses semantic versioning so that is covered already. After much more investigation than I'd care to admit, I have come up with the following compromise as the least-bad solution. First, obviously, the version string for stable/marketing versions is going to be "standard" semver, i.e. major.minor.patch, all numbers, which works, and sorts as expected, for all three schemes. For snapshot releases, we shall use the following (semver) format: ``` 0.13.53-alpha.20200214.3025.ef5d32b7 ``` where the components are, respectively: - `0.13.53`: the expected version string of the next "stable" release. - `alpha`: a marker that hopefully scares people enough. - `20200214`: the date of the release commit, which _MUST_ be on master. - `3025`: the number of commits in master up to the release commit (included). Because we have a linear, append-only master branch, this uniquely identifies the commit. - `ef5d32b7ù : the first 8 characters of the release commit sha. This is not strictly speaking necessary, but makes it a lot more convenient to identify the commit. The main downsides of this format are: 1. It is not a valid format for GHC packages. We do not publish GHC packages from the SDK (so far we have instead opted to release our Haskell code as separate packages entirely), so this should not be an issue. However, our SDK version currently leaks to `ghc-pkg` as the version string for the stdlib (and prim) packages. This PR addresses that by tweaking the compiler to remove the offending bits, so `ghc-pkg` would see the above version number as `0.13.53.20200214.3025`, which should be enough to uniquely identify it. Note that, as far as I could find out, this number would never be exposed to users. 2. It is rather long, which I think is good from a human perspective as it makes it more scary. However, I have been told that this may be long enough to cause issues on Windows by pushing us past the max path size limitation of that "OS". I suggest we try it and see what happens. The upsides are: - It clearly indicates it is an unstable release (`alpha`). - It clearly indicates how old it is, by including the date. - To humans, it is immediately obvious which version is "later" even if they have the same date, allowing us to release same-day patches if needed. (Note: that is, commits that were made on the same day; the release date itself is irrelevant here.) - It contains the git sha so the commit built for that release is immediately obvious. - It sorts correctly under all schemes (modulo the modification for GHC). Alternatives I considered: - Pander to GHC: 0.13.53-alpha-20200214-3025-ef5d32b7. This format would be accepted by all schemes, but will not sort as expected under semantic versioning (though Maven will be fine). I have no idea how it will sort under GHC. - Not having any non-numeric component, e.g. `0.13.53.20200214.3025`. This is not valid semantic versioning and is therefore rejected by npm. - Not having detailed info: just go with `0.13.53-snapshot`. This is what is generally done in the Java world, but we then lose track of what version is actually in use and I'm concerned about bug reports. This would also not let us publish to the main Maven repo (at least not more than once), as artifacts there are supposed to be immutable. - No having a qualifier: `0.13.53-3025` would be acceptable to all three version formats. However, it would not clearly indicate to humans that it is not meant as a stable version, and would sort differently under semantic versioning (which counts it as a prerelease, i.e. before `0.13.53`) than under maven (which counts it as a patch, so after `0.13.53`). - Just counting releases: `0.13.53-alpha.1`, where we just count the number of prereleases in-between `0.13.52` and the next. This is currently the fallback plan if Windows path length causes issues. It would be less convenient to map releases to commits, but it could still be done via querying the history of the `LATEST` file. Release notes ============= > Note: We have decided not to have release notes for snapshot releases. Release notes are a bit tricky. Because we want the ability to make snapshot releases, then later on promote them to stable releases, it follows that we want to build commits from the past. However, if we decide post-hoc that a commit is actually a good candidate for a release, there is no way that commit can have the appropriate release notes: it cannot know what version number it's getting, and, moreover, we now track changes in commit messages. And I do not think anyone wants to go back to the release notes file being a merge bottleneck. But release notes need to be published to the releases blog upon releasing a stable version, and the docs website needs to be updated and include them. The only sensible solution here is to pick up the release notes as of the commit that triggers the release. As the docs cron runs asynchronously, this means walking down the git history to find the relevant commit. > Note: We could probably do away with the asynchronicity at this point. > It was originally included to cover for the possibility of a release > failing. If we are releasing commits from the past after they have been > tested, this should not be an issue anymore. If the docs generation were > part of the synchronous release step, it would have direct access to the > correct release notes without having to walk down the git history. > > However, I think it is more prudent to keep this change as a future step, > after we're confident the new release scheme does indeed produce much more > reliable "stable" releases. New release process =================== Just like releases are currently controlled mostly by detecting changes to the `VERSION` file, the new process will be controlled by detecting changes to the `LATEST` file. The format of that file will include both the version string and the corresponding SHA. Upon detecting a change to the `LATEST` file, CI will run the entire release process, just like it does now with the VERSION file. The main differences are: 1. Before running the release step, CI will checkout the commit specified in the LATEST file. This requires separating the release step from the build step, which in my opinion is cleaner anyway. 2. The `//:VERSION` Bazel target is replaced by a repository rule that gets the version to build from an environment variable, with a default of `0.0.0` to remain consistent with the current `daml-head` behaviour. Some of the manual steps will need to be skipped for a snapshot release. See amended `release/RELEASE.md` in this commit for details. The main caveat of this approach is that the official release will be a different binary from the corresponding snapshot. It will have been built from the same source, but with a different version string. This is somewhat mitigated by Bazel caching, meaning any build step that does not depend on the version string should use the cache and produce identical results. I do not think this can be avoided when our artifact includes its own version number. I must note, though, that while going through the changes required after removing the `VERSION` file, I have been quite surprised at the sheer number of things that actually depend on the SDK version number. I believe we should look into reducing that over time. CHANGELOG_BEGIN CHANGELOG_END	2020-02-25 17:01:23 +01:00
Gary Verhaegen	11efaebe8e	tell Gary when Azure fails to fetch commit (#4611 ) We have seen a number of failures recently where the collect_build_data and notify_user steps failed to fetch their branch commit from GitHub. Trying to reproduce locally doesn't work (i.e. fetching the same sha succeeds), so we're currently assuming transient network errors. This PR adds a retry mechanism as well as a Slack message to help me keep tabs on the issue. CHANGELOG_BEGIN CHANGELOG_END	2020-02-20 13:07:53 +01:00
Andreas Herrmann	655e4b3b55	Update CI nix version (#4443 ) * Update CI nix version For `--option http2 false` to take effect requires Nix 2.3.2. CHANGELOG_BEGIN CHANGELOG_END * Set option `http2 = false` dev-env nix config This is less likely to overlook an instance than manually adding `--option http2 false` to each Nix invocation. Setting `--option htt2p false` also had no effect on the multi-user Nix installation on the Linux CI machines due to ``` WARNING: option '--disk_cache' was expanded to from both option '--config linux' (source /nix/store/2xnfb2l39d2b4nxw5vwmqz5hjwhw0caw-daml-bazelrc) and option '--config linux' (source /nix/store/2xnfb2l39d2b4nxw5vwmqz5hjwhw0caw-daml-bazelrc) ``` Co-authored-by: Andreas Herrmann <andreash87@gmx.ch>	2020-02-07 15:05:52 +00:00
Gary Verhaegen	47bd131f15	add copyright headers to yml files (#4407 ) We seem to have forgotten about them in the copyright scripts. CHANGELOG_BEGIN CHANGELOG_END	2020-02-06 12:54:07 +01:00
Gary Verhaegen	babc390869	fix status in Slack notifications (#4376 ) The $(Agent.JobStatus) variable unfortunately only tracks the status of the current _job_, whereas what we really want here is the status of the whole build. Azure does not have an easy way to do that, but fortunately we already have logic to determine the current build status in `collect_build_data`, so here we can just piggyback on that. CHANGELOG_BEGIN CHANGELOG_END	2020-02-04 15:53:39 +01:00
Gary Verhaegen	af37752686	fix ci alerts (#4358 ) PR #4286 introduced new jobs that do not work well when ran against the master branch, rather than as part of a PR. This hopefully fixes that, though it's hard to test for obvious reasons. CHANGELOG_BEGIN CHANGELOG_END	2020-02-03 19:25:37 +01:00
Gary Verhaegen	00224c2480	ci: alert Slack on PR build completion (#4286 ) One of the outputs of our brainstorming about how to make CI better was that it is annoying to have to "babysit" pull requests. This PR attempts to introduce a notification mechanism by which Azure will notify people on Slack when a build finishes, so they know they need to go and rerun or merge the corresponding PR. This commit also changes the existing $Slack.URL variable to $Slack.team-daml, to make more explicit where the Slack message is being sent to (Slack works with one token per destination channel). Both $Slack.URL and $Slack.team-daml are currently defined as the same token in Azure. CHANGELOG_BEGIN CHANGELOG_END	2020-02-03 16:29:13 +01:00
Gary Verhaegen	d7a9d541c2	ci: fix collect_build_data checkout failures (#4144 ) Azure builds on the merge commit provided by GitHub as refs/pull/<pr-number>/merge. Either GitHub recently changed to clear out such commits faster than they used to, or Azure recently changed to cache the resulting commit sha rather than go through the indirection again. Either way, the end result is that, currently, if the other jobs take "long enough", and `master` has changed in-between the build starting and the `collect_build_data` step running, the latter will fail to checkout the commit it is looking for, and the build will irredeemably fail. The only option is to re-run the entire build (`/azp run` or rebase/push), which is sort of the entire opposite of the whole reason for introducing `collect_build_data` in the first place. This patch aims to address this by not relying on Azure to fetch the daml repo in the `collect_build_data` job. This is definitely a hack, but hopefully one that can alleviate the problem for now. CHANGELOG_BEGIN CHANGELOG_END	2020-01-21 19:14:46 +01:00
Gary Verhaegen	fed3e76ed1	fix collect-build-data status (#4001 ) The main goal of this job is to fail when other, required jobs are canceled. The reason for this is that the communication between Azure and GitHub does not always work very well, particularly around canceled jobs, so that when a job gets canceled GitHub does not always know about it, and furthermore GitHub does not provide a "re-run" button for canceled jobs. Thus, this one provides a failed job that doe sdisplay a "re-run" button, which re-runs all the failed/canceled jobs in the current build. Therefore, this only needs to detect canceled jobs, not failed ones (because those will have their own "re-run" button). Additionally, we recently changed the standard-change label check to not run on master (as it really only makes sense on PRs), resulting in a Skipped status instead of Succeeded, which made this job fail. CHANGELOG_BEGIN CHANGELOG_END	2020-01-09 17:31:32 +01:00
Gary Verhaegen	7260de61c4	add more information to collect_build_data output (#3996 ) This commit: - fixes an issue introduced by `e2015e2ec` whereby the data was never saved to GCS, because if the variable _is_ set, then Azure will substitute the right side of the comparison as well as set the environment variable, so the two sides will still be equal. The new approach skips the first character of the env var and removes the dollar from the (intended) literal expression, ensuring Azure does not substitute. - adds output to the failure case because we have recently seen all commits on master fail that step, regardless of success of other jobs. CHANGELOG_BEGIN CHANGELOG_END	2020-01-09 13:03:54 +01:00
Gary Verhaegen	89af1550b1	check for changelog (#3963 ) * check for changelog	2020-01-07 17:19:50 +01:00
Gary Verhaegen	0f3d9a3e5e	ci: improve failed job detection (#3828 )	2019-12-13 11:44:39 +01:00
Gary Verhaegen	e9e3f1bf5d	add check for Standard-Change label (#3493 )	2019-11-15 16:46:42 +00:00
Gary Verhaegen	59d1ad8e22	write ledger dumps on release (#3376 )	2019-11-07 21:55:25 +00:00
Moritz Kiefer	355e32d843	Clean Azure nix cache (#3323 ) Somehow our current cache seems to be broken. Forcing a clean cache by changing the cache key seems to fix this.	2019-11-04 13:30:57 +01:00
Gary Verhaegen	156edf7432	fix release check (#3112 )	2019-10-04 13:57:49 +00:00
Gary Verhaegen	7ef1e63275	restore full has_release check (#2967 )	2019-09-26 15:10:25 +01:00
Gary Verhaegen	afe789f22e	decode gpg credentials (#3008 )	2019-09-25 00:24:02 +01:00
Gary Verhaegen	4576aed986	sign releases (#2968 )	2019-09-24 12:02:29 +01:00
Gary Verhaegen	d2c8d46873	fix build data encoding (#2980 ) There was an issue where multiline commit messages were not encoded at all, yielding invalid JSON. This should fix it. This commit message is designed to test it before it gets to master. $(exit 1) `exit 1` end quote: " invalid json input	2019-09-23 22:02:33 +01:00
Gary Verhaegen	cf13a98d7a	check JSON formatting of build results (#2935 )	2019-09-18 14:00:53 +01:00
Gary Verhaegen	6815304bee	fix release job condition for broken Azure Pipelines (#2753 )	2019-09-11 14:30:02 +02:00
Moritz Kiefer	f7befca723	Get ghcide from the new upstream repo (#2867 ) * Get ghcide from the new upstream repo * Update azure-pipelines.yml Co-Authored-By: Gary Verhaegen <gary.verhaegen@digitalasset.com>	2019-09-11 08:57:48 +02:00
Moritz Kiefer	d2b68d45d4	Rename hie-core to ghcide (#2820 ) * Rename hie-core to ghcide The name `hie-core` has caused a lot of confusion as to how we relate to haskell-ide-engine so changing it should hopefully help with that. I also think that ghcide is still a good name once we hopefully integrate with haskell-ide-engine more closely. The name ghcide seems to have a reasonable amount of support on Twitter https://twitter.com/ndm_haskell/status/1170681262987710464 which is of course the only good way to come up with names. * Add a readme that points people to the new directory. * Fix bogus replacements * Use a proper link * links are hard	2019-09-09 13:55:16 +00:00
Gary Verhaegen	c97c1b1569	fix azure (#2750 ) fix release job condition for broken Azure Pipelines	2019-09-04 19:54:50 +01:00
Gary Verhaegen	97e289e548	make release step idempotent (#2705 ) At the moment, if we try to run the build of a release commit after it has succeeded, the `git tag` step fails. We do not normally try to rebuild old commits that had succeeded, but sometimes Azure gets confused when we ask for rerunning specific jobs within a build, and reruns the whole build. This creates two (small, bubt annoying) problems: 1. It adds noise to the #team-daml channel as it notifies of the build failure, and 2. It marks the commit as failed with a red cross on the github list of commits, which obviously doesn't look great for release commits. This fixes that. Note that if a release does fail after the `git tag` step (e.g. some network error between Azure and GitHub), this does not change the necessary steps to remediate, as that situation would already be broken in the current setup. (Steps to remediate would essentially boil down to deleting the tag on GitHub before rerunning so the build can create it again.)	2019-08-29 15:51:37 +01:00
Andreas Herrmann	5481f85030	hie-core tests flaky (#2646 ) Rerun up to three times. Previous attempt only changed the display name not the actual command.	2019-08-23 08:42:29 +00:00
Andreas Herrmann	f4f9aa9b97	hie-core stack tests: Mark as flaky (#2607 )	2019-08-20 14:22:02 +00:00
Gary Verhaegen	a15256571a	macOS: cache the bazel repo cache (#2420 )	2019-08-07 11:04:49 +01:00
Gary Verhaegen	351bc0021f	Revert "work around flaky Azure cache (#2377 )" (#2385 ) This reverts commit `9da11c92d8`.	2019-08-02 18:47:46 +01:00

1 2 3 4

181 Commits