Unfortunately, I missed the fact that we had our own logic for
handling process failures which resulted in uncatchable
exceptions. I’ve changed one place to use the upstream handling and
the other to call `fail` which throws an IOException like I would have
expected.
changelog_begin
changelog_end
The change in #8191 to publish daml on sql docs failed because the
versions.json and snapshots.json files don’t exist initially. This PR
fixes that by catching the exception and treating it as an empty file.
changelog_begin
changelog_end
* add blackduck scan to run on master (#6130)
* add blackduck scan
* disable go scanning
exclude entire language-support/ts directory for node scanning
break to multiple lines to make command line params easier to parse
* Increase timeout for blackduck binary scan
* update blackduck scan config
* remove some exclusions, force python3
* exclude GO until path to go executable can be resolved
* added readme explanation of why we want this file
* fail in case of policy violation
* ensure haskell bazel scan completes before running second round scan for bazel jvm and node and other langs
* trigger notices file gen to ensure BOM complete
* remove trailing end of lines
* run with latest detect version and unique code location name changes to wrapper script
* Add blackduck to daily compat job
* DO NOT MERGE: condition false to disable other jobs for testing
* remove parameters not available to cronjob
* Revert changes to regular CI pipeline
CHANGELOG_BEGIN
CHANGELOG_END
Signed-off-by: Brian Healey <brian.healey@digitalasset.com>
* Do not get branch name from variable
* Upgrade com.fasterxml.jackson.core:jackson-databind to 2.12.0 to address security vulnerability
* Remove disabling of other jobs, set to branch to be used on prod runs
* Apply suggestions from code review
Co-authored-by: Gary Verhaegen <gary.verhaegen@digitalasset.com>
* Address code review comments
* Updated NOTICES file
* Run bazel build, update NOTICES file
* Correct dade-assist
* do not have perms to pipe to dev/null
* Add md file explaining how to update NOTICES file
* Add instructions for running blackduck locally
* Add a link to full security-blackduck readme
Co-authored-by: Gary Verhaegen <gary.verhaegen@digitalasset.com>
I wanted to suggest that on the PR but caught it after it was merged. So
I made a note of it, which promptly got lost. As the end of the year
approaches I've started trying to clean up a little.
CHANGELOG_BEGIN
CHANGELOG_END
Having the cron push artifacts to GCP was really only meant to happen
once. I got distracted and worked on other things. This PR closes that
work loop such that the current state and expectations are:
- Every new release pushes to GCP as part of the release process.
- The cron only checks that the GCP backup exists and matches, but does
not push if it doesn't.
The reason for this is we want the cron job to fail if there are
additional, unexpected files in a release, rather than automatically
commit those files for the long term.
CHANGELOG_BEGIN
CHANGELOG_END
This is the macOS part of #5912, which I have separated because our
macOS nodes have a different deployment process so it seemed easier to
track the deployment of the change separately.
CHANGELOG_BEGIN
CHANGELOG_END
GitHub requests that when we use the API without a token, which is what
we do here, we use a user-agent header that allows them to contact us in
case there is an issue they need to discuss. This PR updates that
address, and cleans-up a bit of duplication around it.
CHANGELOG_BEGIN
CHANGELOG_END
This PR allows the script to run without GCP credentials. It will
obviously then skip the bits that require GCP credentials, but that
still leaves it with plenty of things to do.
Because checking all releases can still be quite long (around an hour on
CI, and my personal connection is a bit slower), this also introduces a
new parameter that restricts the number of releases to test.
CHANGELOG_BEGIN
CHANGELOG_END
Currently the return code of the function is the return code of the
`eval "$restore_trap"` line, whereas semantically we want the return
code of the `gsutil` call. This is not an issue in most cases as the
`set -e` should kick in, but if the function appears as the condition in
an `if` statement the `-e` flag is suspended.
The main use-case right now is that the daily license check is _not_
uploading artifacts.
CHANGELOG_BEGIN
CHANGELOG_END
Change the path used to push to the backup gcs bucket to match what is
put by the release script. This needs to get merged before we run the
next daily.
CHANGELOG_BEGIN
CHANGELOG_END
If we don't already have a copy of an artifact in our "disaster
recovery" storage box, put one.
Note: as implemented, this upload mechanism happens only if the release
was successfully verified signature-wise, so this should not result in
us saving broken artifacts. Also, CI does not have deletion or overwrite
access to this bucket, so overall this should be pretty safe.
CHANGELOG_BEGIN
CHANGELOG_END
This PR extends the existing `save_gcp_data` function to handle any
`gsutil` command. This is done to support existence checking using
`gsutil ls` for private artifacts in release checking (`ci/cron`).
CHANGELOG_BEGIN
CHANGELOG_END
At the moment, because the signature check appears in a `if` statement,
failed signatures do not actually fail the script and would thus still
result in "success" messages to Slack.
CHANGELOG_BEGIN
CHANGELOG_END
Yes, this is how I write Haskell. I'm told it's an improvement over
Bash.
Jokes aside, plan is to chip away at the Bash script, starting with
replacing the outermost loop with a proper "get _all_ releases" call
from Haskell, but I like keeping things working in small steps, and even
long-term I have no desire to reimplement the gpg signature checking
code in Haskell.
I have tested that things still work (on my machine); the only
difference is that we now only get the full output all at once at the
end, rather than one signature at a time. I don't think anyone is
looking at the output in real-time, so this should not be a huge issue.
CHANGELOG_BEGIN
CHANGELOG_END
This is a preparatory step for moving at least some of the logic of
checking signatures to this script. The reasoning for putting signatures
in the same script basically boils down to "it already has GitHub
pagination".
I also removed the `run.sh` wrapper because it did not add anything
anymore. It used to be useful, but across various changes it's sort of
lost its purpose.
CHANGELOG_BEGIN
CHANGELOG_END
This is a first, very incomplete step in the spirit of small,
incremental PRs. Known missing features:
- Should check all versions, not just the 30 most recent ones.
- Should also download from GCP backup and compare.
- Should alert on Slack if anything is unexpected.
- Should handle versions prior to us starting to sign (and do what?).
- Should also check artifacts in Artifactory, not just GitHub Releases.
- Optionally should save to GCP if we don't have a backup already.
So at the moment it's just downloading the artifacts for the 30 most
recent releases and printing a message stating whether we have a
signature and whether it's valid.
CHANGELOG_BEGIN
CHANGELOG_END
* Remove reference to release-notes.rst
https://github.com/digital-asset/daml/pull/7458 shuffled this
around. While we could update it, it doesn’t really make any sense. We
post our release notes to the blog now and not in the docs so this
whole checkout procedure is redundant. This is also true if we wanted
to make a bugfix release for a release < 1.5 where this file still
existed. The trigger_sha is always on master (following our current
release process) so the file would still not exist.
I did also remove it from the docs cronjob. We never reupload old docs
so this doesn’t make a difference.
changelog_begin
changelog_end
* Stupid whitespace change because windows is pissing me off
changelog_begin
changelog_end
* Add http-json-perf daily cron job
changelog_begin
changelog_end
* commenting out other jobs so we can manually test the new one
* commenting out other jobs so we can manually test the new one
* Fix the shell script
* Fixing the gs bucket, `gs://http-json bucket does not exist`
* uncomment the other jobs
* timestamp from git log
* get rid of DAR copying
* comment out the other jobs, so we can test it
* uncomment the other jobs
It's been pointed out to me that some people actually plan their work,
and for them, it would be useful to know at least a day in advance that
they will be doing a release. This PR attempts to accommodate that.
Note: this will run as a new, separate cron. I'll do the Azure setup
after this gets approved.
CHANGELOG_BEGIN
CHANGELOG_END
On the last release, the job succeeded despite no being able to create
the compat PR. This fixes:
- The curl call to actually return non-0 on non-2xx HTTP response.
- The way in which we encode the credentials.
This also attempts to create a Bash library, hopefully this time in a
way that doesn't get destroyed by our release process. IIUC pipeline
instructions (YAML files) are all parsed and read before any execution,
so by embedding the Bash library in a template we should get the correct
version (i.e. the one that is running the pipeline) even when checking
out other commits.
CHANGELOG_BEGIN
CHANGELOG_END
CI failed because $4 was unset. We explicitly check later if it is set
so this is intentional, we just need to actually get to that check and
not fail due to set -u.
changelog_begin
changelog_end
This PR attempts to add some automation around assigning release
management. The PR adds a file `release/rotation`; each week, the
updated CI cron job will:
- Open a PR for the new release [as current].
- Assign the first user in the file to that PR.
- Add the Standard-Change label to the PR.
- Start the build for that PR [as current].
- Open a new PR that rotates the `release/rotate` file, i.e. pushes back
the first line to the end of the file.
This PR also adds mentions of the "release handler" (the first line of
`release/rotation`) to the various messages we send to Slack along the
release process.
The initial state of the `release/rotation` file has been created by
listing all the volunteers (Language team, Application Runtime team, as
well as @SamirTalwar-DA and @stefanobaghino-da) and piping the file
through `shuf`. (Then I put myself at the top so I can hopefully iron
out the issues with the first attempt.)
CHANGELOG_BEGIN
CHANGELOG_END
This script is no longer relevant to our internal processes. The report
is now generated by the security team and validated by us, rather than
produced and validated by us.
CHANGELOG_BEGIN
CHANGELOG_END
Even though the Azure Pipelines bot account _clearly_ has write access
to our repo, as it can create the PR, it does not count as having write
access for the purposes of Azure deciding to run the build on the PRs it
opens. Not having the build run on the release PR would defeat the whole
point of having it, so this adds a little nudge to Azure so it does
start the build after opening the PR.
CHANGELOG_BEGIN
CHANGELOG_END
Based on feedback from @nickchapman-da, this PR aims at making the
release process easier by:
- Automatically opening a release PR on Wednesday morning. The goal here
is that by the time we start working, there is a release already
built, so we save about an hour on waiting for that. This obviously
doesn't help with ad-hoc releases.
- On a release PR build, posting to Slack when the release is ready to
merge.
- On a release master build, posting to Slack when a release is ready to
be tested.
My hope is that this makes the release process less tedious. This is not
trying to address the actual release testing, but hopefully should
reduce the annoyance of having to constantly go and check if the release
is ready.
CHANGELOG_BEGIN
CHANGELOG_END
simplify docs cron
This commit changes the "live state" to be that all versions are there
on S3, most of them hidden the way snapshots currently are, and only
displays in the drop-down the list of "supported" versions, i.e. stable
and >= 1.0.0.
The docs cron will now:
- Get list of versions from GitHub (as it does now)
- Get list of versions from S3 (as it does now: versions.json +
snapshots.json, though it assumes we'll have a follow-up PR to change
the latter to hidden.json)
- Compare; if the sets of versions are the same, stop there. (Note: this
"set of versions" here includes the notion of which versions are shown,
not just which ones exist. See the Versions data type in the code.)
- If there is a new hidden version, just build that, push it, change
nothing else. No need to download any of the existing versions or mess
around with anything else (except updating `hidden.json`, otherwise
we're going to be doing this way too often.)
- If there is a new visible version:
- check if we have it locally (i.e. from the previous step: it's a
version we just added)
- figure out the old and new default versions, and then apply the diff
to the top-level directory. Basically download the two folders, list
files that exist in the old one and not in the new one, delete those
from S3, then push the new one to the top-level on S3.
- update versions.json & hidden.json (and for now snapshots.json)
This means that:
- we never mess with the existing versions; we don't need to download
them, we don't need to change them, we don't clean them up. Old links
keep working forever.
- The running time for the docs cron is roughly constant, in that it
should very rarely have to either build or upload (or download) more
than 2 versions per run, and if those instances happen they'd be
accidents (we made 3 actual releases in an hour), not build-up over
time.
This bumps the timeout of the compat tests on PRs to 360 minutes
matching other jobs on a PR (we mainly hit this if ghc-lib is rebuilt)
and the timeout on the daily jobs to 720 minutes (we hit this if
_everything_ is rebuilt).
I am slightly worried about the timeout on the daily job. After having
taken a look at it, there are a few reasons how we ended up here:
1. We started including more tests, e.g., sandbox-classic. Not much we
can do here, those tests are useful.
2. We have a very large number of snapshots for 1.3.0. There are a few
reasons for this:
1. Timing: We branched off early for the 1.2.0 release so the first
snapshot for 1.3 was on June 3th. For 1.4 it looks like the first
snapshot will be on July 15th so that’s roughly 2 extra
snapshots just due to timing.
2. Additional snapshots: We had one broken snapshot due to a broken
VSCode extension that we didn’t delete (probably not worth doing
at this point). We also had to backport to an old snapshot which
resulted in another extra snapshot. We also had one extra
snapshot which was supposed to be the RC but wasn’t since the
ANF revert needed to go in.
The only thing that is clearly useless is the one broken snapshot
but that doesn’t change things that much. I see 2 orthogonal
options for improving this assuming we agree that the current
runtime is worryingly high.
1. Prune snapshots more aggressively, e.g., only include the last 3
snapshots. That’s a pretty arbitrary decision but it would
enforce a hard limit.
2. Reduce test combinations. E.g., only test snapshots vs stable
releases but not snapshots vs snapshots.
3. We end up forcing a full build quite frequently. Here are just 2
examples of how we’ve done that so far.
1. Upgrade rules_haskell. Basically all tests are run by a Haskell
binary so this forces a full rebuild.
2. Change runfiles of `daml`.
I don’t think there is much we can do about 1 or 3 which leaves us
with 2. One not entirely unreasonable option is to just do nothing. We
did have periods where things went pretty smoothly for the most part
and each month we reset to a much smaller number of releases (we also
have to start throwing out old stable releases at some
point). Otherwise reducing the number of test combinations seems the
most promising option to me.
changelog_begin
changelog_end