We've recently seen a few cases where the macOS nodes ended up not
having the cache partition mounted. So far this has only happened on
semi-broken nodes (guest VM still up and running but host unable to
connect to it), so I haven't been able to actually poke at a broken
machine, but I believe this should allow a machine in such a state to
recover.
While we haven't observed a similar issue on Linux nodes (as far as I'm
aware), I have made similar changes there to keep both scripts in sync.
CHANGELOG_BEGIN
CHANGELOG_END
I've seen reports of Artifactory returning 409 when it detects an
invalid POM file, which would map cleanly to our observed behaviour (as
other files do seem to upload fine). I'm not a POM expert so not
entirely sure how to check the actual files, but I do see one error in
the existing, commented code: the path is not a valid Maven repository
path. It should be `groupid/artifactid/version`, i.e. it is currently
missing the `artifactid` bit. So I'd like to try adding that.
I don't know how to test this without making a release, so my plan is to
make a release once this is merged. Open to suggestion on faster ways to
test this.
CHANGELOG_BEGIN
CHANGELOG_END
* Add Oracle support in the trigger service
This PR migrates the ddl & queries and adds tests for this. It does
not yet expose this to users. I’ll handle that in a separate PR.
changelog_begin
changelog_end
* use getOrElse
changelog_begin
changelog_end
base64 includes / which is at the very least pretty confusing and also
broke our cache cleanup which assumes that the cache suffix takes up
one directory. Afaict, we are not length-restricted in gcp paths so we
can just use the hex digits we get from md5
changelog_begin
changelog_end
This is adapting the same approach as #9137 to the macOS machines. The
setup is very similar, except macOS apparently doesn't require any kind
of `sudo` access in the process.
The main reason for the change here is that while `~/.bazel-cache` is
reasonably fast to clean, cleaning just that has finally caught up to us
with a recent cleanup step that proudly claimed:
```
before: 638Mi free
after: 1.2Gi free
```
So we do need to start cleaning the other one after all.
CHANGELOG_BEGIN
CHANGELOG_END
In #9169, I changed the compat jobs to not run on releases.
Unfortunately I forgot to update the `collect_build_data` job to know
about that. Hopefully after this has been merged we'll be able to rerun
\#9221.
CHANGELOG_BEGIN
CHANGELOG_END
As requested by @coceature.
Note: skipping the ts_lib job is enough to skip all compat tests because
they all depend on it, and in the Azure model if one of your
dependencies was skipped you get skipped too.
CHANGELOG_BEGIN
CHANGELOG_END
* Generate exception instances from syntax.
changelog_begin
changelog_end
* II
* III
* VII
* update ghc patch and add test
* VIII
* IX
* Remove DatatypeContexts
* X
* update stack snapshot
* don't need datatypecontexts warning anymore
* X-2
* XII
* XIII
Three issues here:
1. The release job runs on an Azure-hosted agent, so it doesn't have the
`reset_caches.sh` script (and doesn't need it).
2. The `bash-lib` step should not run if the current job has already
failed.
3. The `skip-github` jobs should also not run if the job has failed.
CHANGELOG_BEGIN
CHANGELOG_END
This is a continuation of #8595 and #8599. I somehow had missed that
`/etc/fstab` can be used to tell `mount` to let users mount some
filesystems with preset options.
This is using the full history of `mount` hardening so should be safe
enough. The option `user` in `/etc/fstab` automatically disables any kind
of `setuid` feature on the mounted filesystem, which is the main attack
vector I know of.
This works flawlessly on my local VM, so hopefully this time's the
charm. (It also happens to be my third PR specifically targeted on this
issue, so, who knows, it may even work.)
CHANGELOG_BEGIN
CHANGELOG_END
In Azure Yaml, by defualt, a step runs only if the previous step was
successful. However, that default _disappears if the step has an
explicit condition_. I believe we have a number of conditional steps
that have been written without that intention, and this is thus
restoring what I believe to be the original intention, i.e. _adding_ an
additional condition rather than _replacing_ the default one.
CHANGELOG_BEGIN
CHANGELOG_END
* Release EE SDK tarballs and installer
As before, no way of testing this. I’ll do a snapshot afterwards.
changelog_begin
changelog_end
* .
changelog_begin
changelog_end
* .
changelog_begin
changelog_end
* Rename EE artifacts
changelog_begin
changelog_end
* Move artifact publishing out of yaml files
The current publishing process pretty much hardcodes the set of
artifacts we publish in the yaml config. This is a problem because we
always release from `main` so the yaml files are always
identical. However, we will add new artifacts over time and this
starts falling apart. This PR changes this such that the process
described in the yaml files is very generic and just uploads and
downloads everything in a directory whereas the details are handled in
bash scripts that will come from the respective release branch and are
therefore version-dependent.
As usual for these type of changes, I don’t have a great way to test
this. I did do some due diligence to test that at least the artifacts
are published correctly and I can download them but I can’t test the
actual publishing.
changelog_begin
changelog_end
* Update ci/copy-unix-release-artifacts.sh
Co-authored-by: Gary Verhaegen <gary.verhaegen@digitalasset.com>
* Update ci/copy-windows-release-artifacts.sh
Co-authored-by: Gary Verhaegen <gary.verhaegen@digitalasset.com>
* Update ci/publish-artifactory.sh
Co-authored-by: Gary Verhaegen <gary.verhaegen@digitalasset.com>
Co-authored-by: Gary Verhaegen <gary.verhaegen@digitalasset.com>
* ci/cron/check: remove dade-assist calls
We can only run this in a context where the dev-env is already set up
anyway, as that's how we get Bazel to build the script in the first
place.
CHANGELOG_BEGIN
CHANGELOG_END
* remove skip_java logic
Two quick improvements I made while waiting on #9039:
- Avoid loading Java. When looking at the logs flow by this seemed to be
taking a huge amount of time.
- Isolate the gcloud config files, which allows for running gcloud
downloads in parallel.
Together these reduce the `check_releases` runtime from about 5 hours to
about 2. There's much more (and smarter) work needed on this, but this
was really easy to do.
CHANGELOG_BEGIN
CHANGELOG_END
The people who care about these alerts monitor the channel closely
enough anyway, and having frequent automated @here bells ringing makes
it harder for individuals to highlight important messages.
CHANGELOG_BEGIN
CHANGELOG_END
* Conduitify download in release file
Apologies, my ocd kicked in when I saw this in another PR.
changelog_begin
changelog_end
* fixup deps
changelog_begin
changelog_end
* i should stop working
changelog_begin
changelog_end
I think the retry is clobbering the files. Here is my theory:
- The HTTP request is lazy, i.e. it starts producing a byte stream
before it has finished downloading.
- The connection somehow crashes in the middle of that lazy handling,
possibly because the Haskell code blocks for too long on something
else and GCP thus closes the connection. (If this is true, making sure
we download the entire thing before we start writing may make the
download more reliable.) This explains why we get a "resource vanished"
and not a plain 404 to start with.
- The retry policy doesn't know anything about HTTP requests; it just
sees an IO action throwing an exception and restarts the whole thing.
- Because the IO action opens the file in Append mode, we thus end up
with a file that is too big and has its "starting bytes" multiple
times. That obviously fails to sign-check.
If this is what happens then the retry does not help at all, which does
seem to be what we've been observing (though I haven't tracked the exact
error rate too closely). The fix would likely be as simple as changing
`IO.AppendMode to IO.WriteMode (which truncates, per [documentation]).
[documentation]: https://hackage.haskell.org/package/base-4.14.1.0/docs/System-IO.html
CHANGELOG_BEGIN
CHANGELOG_END
* Merge Maven uploads for different Scala versions
It turns out Maven will abort an existing staging operation if you
create a new one. This means our jobs race against each other. We
could try to fix that by either sequencing the jobs in a clever
way (annoying and can break things like rerunning if only parts
failed), or by creating more profiles (unclear if you can even have
two profiles for the same group id, even if you do, it’s annoying to
merge).
So in this PR I (grudgingly) merged both uploads into the Haskell
script. This isn’t all bad:
1. It moves some logic from bash embedded in yaml string literals into
Haskell code.
2. It duplicates some versions but it removes duplication in other
places so overall not too much worse.
3. It does however, make things slower. We don’t run this stuff in
parallel. That said, the release step is relatively small (< 5min) and
it only runs on Linux.
We could add CLI arguments to make the Scala versions configurable for
local development. Given that this is blocking releases, I wanted to
get something in that works first and then see what we need in that regard.
changelog_begin
changelog_end
* .
changelog_begin
changelog_end
* .
changelog_begin
changelog_end
* .
changelog_begin
changelog_end