Commit Graph

237 Commits

Author SHA1 Message Date
Gary Verhaegen
19c658ae15
ci/cron: move temp file handling to Haskell (#7596)
CHANGELOG_BEGIN
CHANGELOG_END
2020-10-07 17:45:00 +02:00
Gary Verhaegen
c238985bf9
ci/cron: actually fail on invalid signature (#7592)
At the moment, because the signature check appears in a `if` statement,
failed signatures do not actually fail the script and would thus still
result in "success" messages to Slack.

CHANGELOG_BEGIN
CHANGELOG_END
2020-10-07 13:12:13 +02:00
Gary Verhaegen
5799557570
ci/cron: check all releases (#7586)
This walks through the paginated GH API to fetch all releases and check
their signatures.

CHANGELOG_BEGIN
CHANGELOG_END
2020-10-07 12:04:08 +02:00
Gary Verhaegen
4db8c3ada1
ci/cron: move Bash cron inside Haskell script (#7585)
Yes, this is how I write Haskell. I'm told it's an improvement over
Bash.

Jokes aside, plan is to chip away at the Bash script, starting with
replacing the outermost loop with a proper "get _all_ releases" call
from Haskell, but I like keeping things working in small steps, and even
long-term I have no desire to reimplement the gpg signature checking
code in Haskell.

I have tested that things still work (on my machine); the only
difference is that we now only get the full output all at once at the
end, rather than one signature at a time. I don't think anyone is
looking at the output in real-time, so this should not be a huge issue.

CHANGELOG_BEGIN
CHANGELOG_END
2020-10-06 18:45:29 +02:00
Gary Verhaegen
21374713d0
ci/cron: add opt-parse applicative (#7583)
As requested in [previous PR].

[previous PR]: https://github.com/digital-asset/daml/pull/7569#discussion_r499733636

CHANGELOG_BEGIN
CHANGELOG_END
2020-10-06 17:14:27 +02:00
Gary Verhaegen
6d94208226
stop notifying shayne on pr builds (#7581)
CHANGELOG_BEGIN
CHANGELOG_END
2020-10-06 14:20:43 +02:00
Gary Verhaegen
67746b7710
ci/cron: add arg to select docs (#7569)
This is a preparatory step for moving at least some of the logic of
checking signatures to this script. The reasoning for putting signatures
in the same script basically boils down to "it already has GitHub
pagination".

I also removed the `run.sh` wrapper because it did not add anything
anymore. It used to be useful, but across various changes it's sort of
lost its purpose.

CHANGELOG_BEGIN
CHANGELOG_END
2020-10-05 19:35:09 +02:00
Gary Verhaegen
eb6b2ce1c6
ci/cron: small cleanup (#7570)
Small improvements I noticed could be made while working on #7569, in a
separate PR because they're quite unrelated.

CHANGELOG_BEGIN
CHANGELOG_END
2020-10-05 19:33:48 +02:00
Gary Verhaegen
2973228f77
signature check: report to Slack (#7568)
If a tree falls in the forest and all that.

CHANGELOG_BEGIN
CHANGELOG_END
2020-10-05 17:31:11 +02:00
Gary Verhaegen
fda2eca084
periodically check signatures (#7543)
This is a first, very incomplete step in the spirit of small,
incremental PRs. Known missing features:

- Should check all versions, not just the 30 most recent ones.
- Should also download from GCP backup and compare.
- Should alert on Slack if anything is unexpected.
- Should handle versions prior to us starting to sign (and do what?).
- Should also check artifacts in Artifactory, not just GitHub Releases.
- Optionally should save to GCP if we don't have a backup already.

So at the moment it's just downloading the artifacts for the 30 most
recent releases and printing a message stating whether we have a
signature and whether it's valid.

CHANGELOG_BEGIN
CHANGELOG_END
2020-10-01 21:01:42 +02:00
Gary Verhaegen
60b300199b
improve "patch bazel windows" UX (#6764)
This does not get used very often so it is likely nobody will remember
how it works when we do use it. It's And due to the ordering Azure makes
of jobs in its UI, it's very easy to miss that there is a final,
Linux-based step and the values are actually printed there.

So this adds a little note to remind us of that.

Note that as this changes the `ci/patch_bazel_windows` folder, this will
also generate a new Bazel, so this PR will also update the Scoop
reference.

CHANGELOG_BEGIN
CHANGELOG_END
2020-09-30 14:09:51 +02:00
Moritz Kiefer
1243afc4a1
Remove reference to release-notes.rst (#7524)
* Remove reference to release-notes.rst

https://github.com/digital-asset/daml/pull/7458 shuffled this
around. While we could update it, it doesn’t really make any sense. We
post our release notes to the blog now and not in the docs so this
whole checkout procedure is redundant. This is also true if we wanted
to make a bugfix release for a release < 1.5 where this file still
existed. The trigger_sha is always on master (following our current
release process) so the file would still not exist.

I did also remove it from the docs cronjob. We never reupload old docs
so this doesn’t make a difference.

changelog_begin
changelog_end

* Stupid whitespace change because windows is pissing me off

changelog_begin
changelog_end
2020-09-30 11:34:43 +00:00
Moritz Kiefer
89c0f6ca41
Bump test_sha (#7512)
This changed in #7501 but it’s a harmless change.

changelog_begin
changelog_end
2020-09-29 11:01:01 +00:00
Sofia Faro
c5d145358d
Patch ghc to add a daml version header marker. (#7489)
* Update ghc to add a daml version header marker.

(WIP)

changelog_begin
changelog_end

* update stack snapshot

* See what happens with daml-doc

* update patch

* update stack snapshot

* unpin stackage

* Update daml docs tests

* Remove version header ann when parsing module doc

* Update the test

* Set final patch commit SHA.

* update stack snapshot

* unpin stackage (mac/linux)

* lint

* unpin stackage (win)
2020-09-28 17:01:20 +00:00
Remy
65f1a247d0
Bump test_sha in perf tests (#7441)
CHANGELOG_BEGIN
CHANGELOG_END
2020-09-18 18:30:39 +00:00
Gary Verhaegen
29b876920e
ci/patch_bazel_windows: fail on curl failure (#7431)
At the moment, with the `curl` call comfortably nested inside an `if`
statement, `curl` failures are interpreted as simply going down the
`else` branch of the conditional. Hoisting it up to a separate variable
declaration makes the script fail on the curl call itself.

CHANGELOG_BEGIN
CHANGELOG_END
2020-09-18 15:57:21 +02:00
Gary Verhaegen
f98b92d7ba
reset Windows cache (#7423)
CHANGELOG_BEGIN
CHANGELOG_END
2020-09-16 22:38:40 +02:00
Leonid Shlyapnikov
d13e7aa184
JSON API daily perf test cron job (#7406)
* Add http-json-perf daily cron job

changelog_begin
changelog_end

* commenting out other jobs so we can manually test the new one

* commenting out other jobs so we can manually test the new one

* Fix the shell script

* Fixing the gs bucket, `gs://http-json bucket does not exist`

* uncomment the other jobs

* timestamp from git log

* get rid of DAR copying

* comment out the other jobs, so we can test it

* uncomment the other jobs
2020-09-16 13:02:02 -04:00
Gary Verhaegen
9632a2fa43
fix check-changelog (#7419)
In #7386 I inadvertently removed the default value for the argument
`check-changelog.sh` takes. This did not break CI because CI always
passes the argument explicitly, but it did break local workflows for
some developers.

Apologies.

CHANGELOG_BEGIN
CHANGELOG_END
2020-09-16 13:19:37 +00:00
Gary Verhaegen
b2d58a3304
save daily perf results (#7396)
It's a real shame I forgot to do this sooner, but better late than never
I suppose.

CHANGELOG_BEGIN
CHANGELOG_END
2020-09-14 18:38:31 +00:00
Gary Verhaegen
bcee8f9152
let dependabot bypass changelog check (#7386)
Currently, when Dependabot makes a PR, not only do we need to manually
trigger CI through `/azp run`, we also need to either change the commit
or add a new one to pass the changelog check. This addresses that second
part by making a exception for PRs signed by dependabot.

This is also a bit of an excuse to play with git signatures and gpg.

CHANGELOG_BEGIN
CHANGELOG_END
2020-09-14 14:04:37 +02:00
Moritz Kiefer
c3d758a2d3
Ignore failed ipcrm calls (#7218)
* Ignore failed ipcrm calls

changelog_begin
changelog_end

* Update ci/clear-shared-segments-macos.yml

Co-authored-by: Samir Talwar <samir.talwar@digitalasset.com>

Co-authored-by: Samir Talwar <samir.talwar@digitalasset.com>
2020-08-25 08:39:11 +00:00
Moritz Kiefer
5936644970
Bump windows cache (#7212)
changelog_begin
changelog_end
2020-08-24 18:19:28 +02:00
Gary Verhaegen
6d1adee92f
push script-runner and trigger-runner to Artifactory (#7196)
I have created the corresponding user and repositories on Artifactory,
and tested the `curl` command manually. I'll add the corresponding
credentials to Azure once this is approved.

CHANGELOG_BEGIN
CHANGELOG_END
2020-08-20 19:11:27 +02:00
Gary Verhaegen
51e7c88bf5
announce release rotation on Tuesday (#7151)
It's been pointed out to me that some people actually plan their work,
and for them, it would be useful to know at least a day in advance that
they will be doing a release. This PR attempts to accommodate that.

Note: this will run as a new, separate cron. I'll do the Azure setup
after this gets approved.

CHANGELOG_BEGIN
CHANGELOG_END
2020-08-17 13:42:35 +02:00
Gary Verhaegen
d43419d339
update perf sha (#7140)
Change in Speedy API.

CHANGELOG_BEGIN
CHANGELOG_END
2020-08-14 12:55:06 +02:00
Gary Verhaegen
1baea84ca0
fix auth header for compat pr (#7134)
On the last release, the job succeeded despite no being able to create
the compat PR. This fixes:

- The curl call to actually return non-0 on non-2xx HTTP response.
- The way in which we encode the credentials.

This also attempts to create a Bash library, hopefully this time in a
way that doesn't get destroyed by our release process. IIUC pipeline
instructions (YAML files) are all parsed and read before any execution,
so by embedding the Bash library in a template we should get the correct
version (i.e. the one that is running the pipeline) even when checking
out other commits.

CHANGELOG_BEGIN
CHANGELOG_END
2020-08-14 11:35:57 +02:00
Moritz Kiefer
67f350694c
Fix release cron job (#7095)
CI failed because $4 was unset. We explicitly check later if it is set
so this is intentional, we just need to actually get to that check and
not fail due to set -u.

changelog_begin
changelog_end
2020-08-12 10:50:28 +02:00
Stephen Compall
6306b9f8d8
perf test harmlessly changed in #6907; match sha (#7065)
CHANGELOG_BEGIN
CHANGELOG_END
2020-08-10 07:26:18 +00:00
Moritz Kiefer
31c2ce0220
Bump Windows cache (#7056)
The "output was not created" errors seem to have become very
frequent. While taking out nodes seems to work as a bandaid, I’d like
to see if resetting the cache buys us a few days of not having to deal
with this. Admittedly, I don’t really have an explanation for why
resetting the cache should help if taking out the machines seems to do
something (suggesting that it hasn’t propagated fully).

changelog_begin
changelog_end
2020-08-07 09:05:32 +00:00
Gary Verhaegen
00f3de63c9
rotate responsibility for release process (#7011)
This PR attempts to add some automation around assigning release
management. The PR adds a file `release/rotation`; each week, the
updated CI cron job will:

- Open a PR for the new release [as current].
- Assign the first user in the file to that PR.
- Add the Standard-Change label to the PR.
- Start the build for that PR [as current].
- Open a new PR that rotates the `release/rotate` file, i.e. pushes back
  the first line to the end of the file.

This PR also adds mentions of the "release handler" (the first line of
`release/rotation`) to the various messages we send to Slack along the
release process.

The initial state of the `release/rotation` file has been created by
listing all the volunteers (Language team, Application Runtime team, as
well as @SamirTalwar-DA and @stefanobaghino-da) and piping the file
through `shuf`. (Then I put myself at the top so I can hopefully iron
out the issues with the first attempt.)

CHANGELOG_BEGIN
CHANGELOG_END
2020-08-05 18:58:56 +02:00
Gary Verhaegen
1ff75f1256
remove monthly report (#6967)
This script is no longer relevant to our internal processes. The report
is now generated by the security team and validated by us, rather than
produced and validated by us.

CHANGELOG_BEGIN
CHANGELOG_END
2020-08-04 12:01:07 +02:00
Remy
cc0b497d23
Bump test_sha in perf tests (#6972)
CHANGELOG_BEGIN
CHANGELOG_END
2020-08-03 17:21:49 +00:00
Gary Verhaegen
614b4298a1
update base image for SDK Docker image (#6970)
The openjdk-alpine images we were relying on so far do not have security
updates anymore.

CHANGELOG_BEGIN
CHANGELOG_END
2020-08-03 18:20:31 +02:00
Gary Verhaegen
ef465de0a8
run build on automated release PRs (#6964)
Even though the Azure Pipelines bot account _clearly_ has write access
to our repo, as it can create the PR, it does not count as having write
access for the purposes of Azure deciding to run the build on the PRs it
opens. Not having the build run on the release PR would defeat the whole
point of having it, so this adds a little nudge to Azure so it does
start the build after opening the PR.

CHANGELOG_BEGIN
CHANGELOG_END
2020-08-03 16:42:00 +02:00
Samir Talwar
11bc582a9e
Publish DAML on SQL to GitHub Releases. (#6876) 2020-07-27 17:33:20 +02:00
Gary Verhaegen
8043756883
better release triggers (#6859)
Based on feedback from @nickchapman-da, this PR aims at making the
release process easier by:

- Automatically opening a release PR on Wednesday morning. The goal here
  is that by the time we start working, there is a release already
  built, so we save about an hour on waiting for that. This obviously
  doesn't help with ad-hoc releases.
- On a release PR build, posting to Slack when the release is ready to
  merge.
- On a release master build, posting to Slack when a release is ready to
  be tested.

My hope is that this makes the release process less tedious. This is not
trying to address the actual release testing, but hopefully should
reduce the annoyance of having to constantly go and check if the release
is ready.

CHANGELOG_BEGIN
CHANGELOG_END
2020-07-24 16:40:11 +00:00
Gary Verhaegen
97d1fa1e04
fix docs cron (#6864)
I lost the public ACL in #6817.

CHANGELOG_BEGIN
CHANGELOG_END
2020-07-24 16:13:00 +00:00
Gary Verhaegen
818a52b094
simplify docs cron (#6817)
simplify docs cron

This commit changes the "live state" to be that all versions are there
on S3, most of them hidden the way snapshots currently are, and only
displays in the drop-down the list of "supported" versions, i.e. stable
and >= 1.0.0.

The docs cron will now:

- Get list of versions from GitHub (as it does now)
- Get list of versions from S3 (as it does now: versions.json +
  snapshots.json, though it assumes we'll have a follow-up PR to change
  the latter to hidden.json)
- Compare; if the sets of versions are the same, stop there. (Note: this
  "set of versions" here includes the notion of which versions are shown,
  not just which ones exist. See the Versions data type in the code.)
- If there is a new hidden version, just build that, push it, change
  nothing else. No need to download any of the existing versions or mess
  around with anything else (except updating `hidden.json`, otherwise
  we're going to be doing this way too often.)
- If there is a new visible version:
  - check if we have it locally (i.e. from the previous step: it's a
    version we just added)
  - figure out the old and new default versions, and then apply the diff
    to the top-level directory. Basically download the two folders, list
    files that exist in the old one and not in the new one, delete those
    from S3, then push the new one to the top-level on S3.
- update versions.json & hidden.json (and for now snapshots.json)

This means that:

- we never mess with the existing versions; we don't need to download
  them, we don't need to change them, we don't clean them up. Old links
  keep working forever.
- The running time for the docs cron is roughly constant, in that it
  should very rarely have to either build or upload (or download) more
  than 2 versions per run, and if those instances happen they'd be
  accidents (we made 3 actual releases in an hour), not build-up over
  time.
2020-07-24 14:40:32 +02:00
Andreas Herrmann
4b1438276c
Update Bazel 2.1.0 --> 3.3.1 (#6761)
* Upgrade nixpkgs revision

* Remove unused minio

It used to be used as a gateway to push the Nix cache to GCS, but has
since been replaced by nix-store-gcs-proxy.

* Update Bazel on Windows

changelog_begin
changelog_end

* Fix hlint warnings

The nixpkgs update implied an hlint update which enabled new warnings.

* Fix "Error applying patch"

Since Bazel 2.2.0 the order of generating `WORKSPACE` and `BUILD` files
and applying patches has been reversed. The allows users to define
patches to these files that will not be immediately overwritten.
However, it also means that patches on another repository's original
`WORKSPACE` file will likely become invalid.

* a948eb7255
* https://github.com/bazelbuild/bazel/issues/10681

Hint: If you're generating a patch with `git` then you can use the
following command to exclude the `WORKSPACE` file.

```
git diff ':(exclude)WORKSPACE'
```

* Update rules_nixpkgs

* nixpkgs location expansion escaping

* Drop --noincompatible_windows_native_test_wrapper

* client_server_test using sh_inline_test

client_server_test used to produce an executable shell script in form of
a text file output. However, since the removal of
`--noincompatible_windows_native_test_wrapper` this no longer works on
Windows since `.sh` files are not directly executable on Windows.

This change fixes the issue by producing the script file in a dedicated
rule and then wrapping it in a `sh_test` rule which also works on
Windows.

* daml_test using sh_inline_test

* daml_doc_test using sh_inline_test

* _daml_validate_test using sh_inline_test

* damlc_compile_test using sh_inline_test

* client_server_test find .exe on Windows

* Bump Windows cache for Bazel update

Remove `clean --expunge` after merge.

Co-authored-by: Andreas Herrmann <andreas.herrmann@tweag.io>
2020-07-23 09:46:04 +02:00
Remy
28ab504b21
Bump test_sha in perf tests (#6825)
CHANGELOG_BEGIN
CHANGELOG_END
2020-07-22 11:00:53 +00:00
Moritz Kiefer
edd84a09d5
Fix reference to return produced by ApplicativeDo (#6821)
* Fix reference to return produced by ApplicativeDo

see https://github.com/digital-asset/ghc/pull/53 for details.

fixes #6820

changelog_begin
changelog_end

* bump to merged commit

changelog_begin
changelog_end

* switch to new ghc-lib

changelog_begin
changelog_end
2020-07-22 10:09:23 +00:00
Remy
d538d9a53e
Bump test_sha in perf tests (#6816)
CHANGELOG_BEGIN
CHANGELOG_END
2020-07-21 16:56:01 +00:00
Robert Autenrieth
7ce9748066
Split sandbox code into separate packages (#6695)
* Move public code into daml-integration-api

CHANGELOG_BEGIN
[DAML Integration Kit]: Removed sandbox specific code from the API intended to be used by ledger integrations. Use the maven coordinates ``com.daml:participant-integration-api:VERSION`` instead of ``com.daml:ledger-api-server`` or ``com.daml:sandbox``.
CHANGELOG_END
2020-07-17 17:06:06 +02:00
Moritz Kiefer
147a2700c0
Bump Windows cache (#6770)
To “fix” the “output was not created” errors.

changelog_begin
changelog_end
2020-07-17 12:41:10 +02:00
Moritz Kiefer
52b9eabbcc
Revert "refactor ci jobs: add setvar to ci/lib.sh (#6708)" (#6732)
This reverts commit 61e9df3eaf.

This interacts very badly with the fact that we check out old commits
for releases. While we could fix it for this particular issue, I don’t
think this buys us enough to make this worth doing and it makes it
easy to introduce issues in the future if we modify lib.sh

changelog_begin
changelog_end
2020-07-14 23:53:49 +02:00
Gary Verhaegen
61e9df3eaf
refactor ci jobs: add setvar to ci/lib.sh (#6708)
CHANGELOG_BEGIN
CHANGELOG_END
2020-07-13 17:34:54 +02:00
Moritz Kiefer
631ed3e891
Bump timeouts in compat tests (#6689)
This bumps the timeout of the compat tests on PRs to 360 minutes
matching other jobs on a PR (we mainly hit this if ghc-lib is rebuilt)
and the timeout on the daily jobs to 720 minutes (we hit this if
_everything_ is rebuilt).

I am slightly worried about the timeout on the daily job. After having
taken a look at it, there are a few reasons how we ended up here:

1. We started including more tests, e.g., sandbox-classic. Not much we
   can do here, those tests are useful.

2. We have a very large number of snapshots for 1.3.0. There are a few
   reasons for this:

   1. Timing: We branched off early for the 1.2.0 release so the first
      snapshot for 1.3 was on June 3th. For 1.4 it looks like the first
      snapshot will be on July 15th so that’s roughly 2 extra
      snapshots just due to timing.

   2. Additional snapshots: We had one broken snapshot due to a broken
      VSCode extension that we didn’t delete (probably not worth doing
      at this point). We also had to backport to an old snapshot which
      resulted in another extra snapshot. We also had one extra
      snapshot which was supposed to be the RC but wasn’t since the
      ANF revert needed to go in.

   The only thing that is clearly useless is the one broken snapshot
   but that doesn’t change things that much. I see 2 orthogonal
   options for improving this assuming we agree that the current
   runtime is worryingly high.

   1. Prune snapshots more aggressively, e.g., only include the last 3
      snapshots. That’s a pretty arbitrary decision but it would
      enforce a hard limit.

   2. Reduce test combinations. E.g., only test snapshots vs stable
      releases but not snapshots vs snapshots.

3. We end up forcing a full build quite frequently. Here are just 2
   examples of how we’ve done that so far.

   1. Upgrade rules_haskell. Basically all tests are run by a Haskell
      binary so this forces a full rebuild.

   2. Change runfiles of `daml`.

I don’t think there is much we can do about 1 or 3 which leaves us
with 2. One not entirely unreasonable option is to just do nothing. We
did have periods where things went pretty smoothly for the most part
and each month we reset to a much smaller number of releases (we also
have to start throwing out old stable releases at some
point). Otherwise reducing the number of test combinations seems the
most promising option to me.

changelog_begin
changelog_end
2020-07-10 12:34:53 +00:00
Moritz Kiefer
6c0bbd3ba6
Bump test_sha in perf tests (#6649)
This changed by the revert of the ANF changes which is harmless by the
same reasoning that made bumping it harmless when we introduced it.

changelog_begin
changelog_end
2020-07-08 12:26:11 +00:00
Samir Talwar
89369b3bb9
CI: Increase the PostgreSQL connections from 100 to 200. (#6647)
We saw a flake recently where PostgreSQL stopped accepting connections
during a CI run, leading the build to fail. This increases the number of
connections to 200 from the default of 100, hopefully mitigating issues
such as this one.

CHANGELOG_BEGIN
CHANGELOG_END
2020-07-08 10:49:11 +00:00