When we build a release, it is always a "past" commit - typically, one
that has already been tested twice: once when the corresponding PR was
run, and then again as a "main"-branch commit.
Release branches don't run, but their protection rules enforce linear
merges.
Either way, we know we're building a _good_ commit, and, assuming our
builds and tests are hermetic, testing that commit again when we make a
release is a pure waste of time and CPU resources.
The other case, where we make an ad-hoc release from a branch that has
not been merged, has a similar issue: we do not necessarily want to run
the full test suite, because part of the reason we need that commit may
be that it doesn't succeed as is.
Based on that observation, I wondered what might be the minimal set of
things we actually need to build when making a release. This PR is an
experiment in trying to find that out.
Right now building Linux/ARM fails on release branches. This is because:
- Release branches do not support Linux/ARM.
- Releases are built from `main`, using `main`'s version of YAML files.
This should skip over the Linux/ARM build when running on release
branches.
Specifically:
- m1 builds
- BlackDuck & notices bump
- daily compatibility tests
- daily compat update (if needed)
- daily perf test & report
Also, merge the canton update jobs as that makes a lot more sense at
this point. Future divergences can be expressed by changing the files in
their respective branches.
This PR will need to be backported to main-2.x to fully take effect.
* do not run pr-only tests on main, do not run main-only tests on prs
* split data dep tests into main-only and pr-only
* run non-dev conformance tests on main only
For the used-to-be-rare-but-not-so-much-anymore case where the job fails
after having pushed its logs (without this the push fails as we can't
overwrite artifacts).
This was spurred by the fact that the "report_end" task sometimes fails
on m1 with the "install Bash lib" step just never finishing (and the
whole job then times out after 6h).
Hopefully by running fewer things we get fewer chances of these kinds of
weird issues.
Note that it's unclear if anything actually crashes on the m1 machines
or if this is a loss of connection between Azure Pipelines and the
machine. From what I've seen as soon as that job times out the machine
is able to successfully pick up other jobs. Speaking of, I've also
reduced the 6h timeouts to a more reasonable 3h.
We routinely have upwards of 3GB of logs. They are very rarely
downloaded, most people don't even know they're there. Uploading 3GB
takes time. This should make it faster, hopefully.
I also changed CI config so we run this on every build but only upload
on releases. That should hopefully make sure we catch this immediately
next time. The script is fast enough that this shouldn’t slow this
down meaningfully.
changelog_begin
changelog_end
New year, new copyright, new expected unknown issues with various files
that won't be covered by the script and/or will be but shouldn't change.
I'll do the details on Jan 1, but would appreciate this being
preapproved so I can actually get it merged by then.
CHANGELOG_BEGIN
CHANGELOG_END
They migrated our account so hopefully this should work without any
other changes and fix our publishing issues.
I’m keeping the long timeout for now since I don’t know what an
appropriate timeout is.
changelog_begin
changelog_end
This has now screwed us over for two releases (1.14 and currently
blocking 1.15) because we didn’t backport the change. While we could
backport this, it is annoying and provides little to no benefit given
that a failure here is harmless so let’s just ignore failures here.
changelog_begin
changelog_end
uname is the name for Linux and Linux_scala_2_12 which causes builds
to override each other and it looks like that might even break in case
of concurrent uploads although that could also be general flakiness in Azure.
changelog_begin
changelog_end
Even with the cache retries something still doesn’t seem to be cached
quite like I expect. I can’t really debug this without exec logs so
this PR starts publishing those.
changelog_begin
changelog_end
* Move artifact publishing out of yaml files
The current publishing process pretty much hardcodes the set of
artifacts we publish in the yaml config. This is a problem because we
always release from `main` so the yaml files are always
identical. However, we will add new artifacts over time and this
starts falling apart. This PR changes this such that the process
described in the yaml files is very generic and just uploads and
downloads everything in a directory whereas the details are handled in
bash scripts that will come from the respective release branch and are
therefore version-dependent.
As usual for these type of changes, I don’t have a great way to test
this. I did do some due diligence to test that at least the artifacts
are published correctly and I can download them but I can’t test the
actual publishing.
changelog_begin
changelog_end
* Update ci/copy-unix-release-artifacts.sh
Co-authored-by: Gary Verhaegen <gary.verhaegen@digitalasset.com>
* Update ci/copy-windows-release-artifacts.sh
Co-authored-by: Gary Verhaegen <gary.verhaegen@digitalasset.com>
* Update ci/publish-artifactory.sh
Co-authored-by: Gary Verhaegen <gary.verhaegen@digitalasset.com>
Co-authored-by: Gary Verhaegen <gary.verhaegen@digitalasset.com>
* Merge Maven uploads for different Scala versions
It turns out Maven will abort an existing staging operation if you
create a new one. This means our jobs race against each other. We
could try to fix that by either sequencing the jobs in a clever
way (annoying and can break things like rerunning if only parts
failed), or by creating more profiles (unclear if you can even have
two profiles for the same group id, even if you do, it’s annoying to
merge).
So in this PR I (grudgingly) merged both uploads into the Haskell
script. This isn’t all bad:
1. It moves some logic from bash embedded in yaml string literals into
Haskell code.
2. It duplicates some versions but it removes duplication in other
places so overall not too much worse.
3. It does however, make things slower. We don’t run this stuff in
parallel. That said, the release step is relatively small (< 5min) and
it only runs on Linux.
We could add CLI arguments to make the Scala versions configurable for
local development. Given that this is blocking releases, I wanted to
get something in that works first and then see what we need in that regard.
changelog_begin
changelog_end
* .
changelog_begin
changelog_end
* .
changelog_begin
changelog_end
* .
changelog_begin
changelog_end
* Fixup condition for running publish_mvn_npm
This needs to run for both linux and linux-scala-2.13
changelog_begin
changelog_end
* Update ci/build-unix.yml
Co-authored-by: Samir Talwar <samir.talwar@digitalasset.com>
Co-authored-by: Samir Talwar <samir.talwar@digitalasset.com>
My goal here is to investigate the new warning Azure has been showing
for the past few days:
> ##[warning]%25 detected in ##vso command. In March 2021, the agent command parser will be updated to unescape this to %. To opt out of this behavior, set a job level variable DECODE_PERCENTS to false. Setting to true will force this behavior immediately. More information can be found at https://github.com/microsoft/azure-pipelines-agent/blob/master/docs/design/percentEncoding.md
As far as I'm aware we are not deliberately passing in any `%25` in any
of our `vso` commands, so I was a bit surprised by this.
CHANGELOG_BEGIN
CHANGELOG_END
* include oauth2 logback config in release tarball
overlooked in https://github.com/digital-asset/daml/pull/8611
* Release trigger-service and oauth2-middleware JARs
changelog_begin
changelog_end
* drop from artifacts.yaml
Co-authored-by: Andreas Herrmann <andreas.herrmann@tweag.io>
Current reports look like:
```
Disk cache small enough:\n20G/home/vsts/.bazel-cache
```
because `echo` does not convert `\n`. An alternative would be to replace
`echo` with `printf`, but I have had enough issues with
subshells-in-strings lately that I prefer just avoiding them when
possible.
CHANGELOG_BEGIN
CHANGELOG_END
This is the equivalent of #8515 for Linux. There was some concern that
`bazel` would be upset at having that cache removed, so I spent a fair
amount of time trying to break it (on a Linux VM, as for some reason
`bazel` chooses not to use `~/.cache` on macOS). I could not make
`bazel` unhappy by deleting the whole thing. Deleting random files,
however, did end up producing error messages along the lines of:
```
$ bazel build //...
FATAL: corrupt installation: file '/home/vagrant/.cache/bazel/_bazel_vagrant/install/73d06d52dbf3a8e6ed43f5bf5f115eb0/embedded_tools/src/BUILD' is missing or modified. Please remove '/home/vagrant/.cache/bazel/_bazel_vagrant/install/73d06d52dbf3a8e6ed43f5bf5f115eb0' and try again.
```
which suggest busting the entire thing as a solution, so I think we're
safe here.
CHANGELOG_BEGIN
CHANGELOG_END
Hopefully this works around our recent CI disk space issues, while 80GB
should be large enough that it only happens once per machine per day, so
perf shouldn't be impacted too much.
CHANGELOG_BEGIN
CHANGELOG_END