Commit Graph

211 Commits

Author SHA1 Message Date
Moisés Ackerman
c8d0bc4ffc
infra/azure.tf: add /root/get-targets.sh script to select CI scaling targets (#18052)
This is used to modify the scaling targets during the holiday break while avoiding cron infelicities
2024-01-08 13:27:24 +00:00
Gary Verhaegen
faf1604308
infra: remove stale GCP resources (#18082) 2024-01-03 17:32:22 +01:00
Moisés Ackerman
4ecee766b6
Revert "ci holiday: even CI nodes deserve a break (#18045)" (#18051)
This reverts commit 5a0f0a18af.
2023-12-19 18:03:39 +00:00
Gary Verhaegen
5a0f0a18af
ci holiday: even CI nodes deserve a break (#18045)
This updates the daily-reset cron to make a special case of December 23
through 31 and essentially count those days as weekend days.
2023-12-15 18:05:53 +01:00
Gary Verhaegen
e8ef96fdc5
infra: update gcp permissions (#17778) 2023-11-08 11:35:27 +00:00
Gary Verhaegen
55f387e8d3
infra: rebuild macOS nodes (#17600) 2023-10-19 18:39:13 +02:00
Gary Verhaegen
ee08d89f13
infra: pin nix (#17484) 2023-09-27 13:32:23 +02:00
Gary Verhaegen
b9e8d94d1f
infra: add Production tag to daml-ci (#17476) 2023-09-25 15:21:14 +02:00
Gary Verhaegen
4bf7693f42
fix cache (#17377)
Looks like the issue was Bazel 5.2.0.
2023-09-08 10:31:42 +02:00
Gary Verhaegen
a9e579e278
infra: let CI nodes sleep at night (#17150)
Instead of always having all the nodes up, this tweaks the CI
infrastructure to reduce the number of nodes "at night" and over the
weekend.
2023-07-19 14:27:06 +02:00
Gary Verhaegen
5f3e1d18be
hoogle: bump nix (#17124) 2023-07-13 13:39:39 +02:00
Gary Verhaegen
566eb3bad5
infra: faster startup for ubuntu machines (#17121) 2023-07-13 11:34:49 +00:00
Remy
7b7a6f6211
Reactivate windows tests disable in #16760 (#16958)
namely:
-  //daml-script/export/integration-tests/reproduces-transactions:test
-  //compiler/damlc:daml-stdlib-doctest

Co-authored-by: Gary Verhaegen <gary.verhaegen@digitalasset.com>
2023-06-20 16:28:50 +02:00
Gary Verhaegen
9d228985ff
reduce cache retention times (#16964) 2023-06-07 16:44:01 +02:00
Gary Verhaegen
2b911c1d9c
fix Windows (#16950) 2023-06-01 17:05:04 +02:00
Gary Verhaegen
fc7c99a8a1
infra: tweak Azure config (#16873)
- Change machine types to be closer to GCP profiles.
- Set partition sizes correctly on Windows (bigger disks mean little if
  partitions don't match).
- Enable full caching on disk access as the only downside seems to be
  "risk of data loss if the machine shuts down unexpectedly", and we
  never want a machine to come back if it shut down.
2023-05-17 11:55:10 +02:00
Gary Verhaegen
df791fc87d
infra: increase Windows disk sizes (#16812) 2023-05-05 18:53:27 +02:00
Gary Verhaegen
af51d6601c
infra: multiple IPs for NAT gateway (#16803) 2023-05-04 17:59:47 +00:00
Gary Verhaegen
5385724e29
clean up Windows machines (#16788) 2023-05-03 13:20:34 +00:00
Gary Verhaegen
e657624aa7
change Azure subscription (#16741) 2023-04-21 18:16:43 +02:00
Gary Verhaegen
b149ffa8b8
Windows clean up (#16723)
* shut down GCP windows nodes
* shut down periodic-killer
* cycle Windows nodes in daily-reset
2023-04-20 17:06:26 +02:00
Gary Verhaegen
99821e0b66
infra: add a Windows node on Azure (#16705) 2023-04-20 08:32:25 -04:00
Gary Verhaegen
c67ffe403e
remove ubuntu nodes on reset (#16703) 2023-04-19 11:10:00 +00:00
Gary Verhaegen
40f6cb26dc
infra: move Ubuntu to Azure (#16690)
The daily restart is now working, I think we can switch over. I'm
keeping the GCP configuration around for the time being in case we need
to roll back for some reason; if everything goes smoothly I'll remove
all of it in a month or so.
2023-04-13 09:21:38 -04:00
Gary Verhaegen
23ee54ab7b
infra: fix inadvertent shadowing in reset-azure (#16678) 2023-04-11 17:28:23 +02:00
Gary Verhaegen
36a5d2632a
infra: add daily-reset for azure nodes (#16676) 2023-04-10 23:52:35 +02:00
Gary Verhaegen
b54bdcf00d
fix small oversight in azure config (#16629) 2023-03-30 13:39:47 +02:00
Gary Verhaegen
7d69a5975c
ci: set up Ubuntu nodes on Azure (#16610) 2023-03-29 15:21:14 +02:00
Gary Verhaegen
d817383c87
infra: larger virtual disks (#16498)
I've seen a few "disk full" errors recently on CI, and they don't quite
make sense because we do have this "reset cache" step that looks at
available disk space and cleans up if needed.

So I dug a little bit more and found this discrepancy: the real hard
drives are 400g, the virtual disks are 200g. So what may happen is we
still have plenty of space on the real drives (well, they're virtual,
but bear with me), while the virtual ones are full. And since the
clean-up step only looks at the free space on the real drives, it goes
ahead without cleaning up and then when we try to download stuff there's
no free space on the virtual drives.

This didn't use to be a problem because the size of the virtual drives
mapped to the size of the real ones. This PR aims at restoring that
equivalence.
2023-03-09 17:50:55 +00:00
Gary Verhaegen
518248425a
infra: document token rotation (#16461) 2023-03-06 13:50:55 +00:00
Gary Verhaegen
9b50c9f10c
[infra] revert logging change (#16434)
[infra] revert logging change

Reverting #16393 now that Google has fixed things on their side.
2023-03-01 23:01:33 +00:00
Gary Verhaegen
9432f13823
ci: fix Linux nodes (#16393) 2023-02-27 09:34:14 -05:00
Gary Verhaegen
7fb303be3b
update CI node reset permissions (#16216) 2023-02-02 09:06:39 +01:00
Gary Verhaegen
7e59c7e74c
revert #16041 (#16059)
Signatures have been updated, we don't need delete permissions anymore.
2023-01-13 15:40:48 +01:00
Gary Verhaegen
5339642602
infra: add delete permission to assembly account (#16041) 2023-01-11 19:42:15 +00:00
Gary Verhaegen
151e12b81a
bump copyright (#16002)
This is the result of:

- Updating `./COPY` to say `2023`.
- Running `./dev-env/bin/dade-copyright-headers update .`
2023-01-04 18:21:15 +01:00
Gary Verhaegen
65018bcd28
remove Victor's access to data bucket (#15629) 2022-11-24 14:47:37 +01:00
Gary Verhaegen
382b091f77
ci: document node handling a bit more (#15339)
CHANGELOG_BEGIN
CHANGELOG_END
2022-10-26 11:03:27 +02:00
Edward Newman
8e44f3b8bf
M1 build setup using Packer & Tart (#14635)
* M1 build setup using Packer

* Add change log

CHANGELOG_BEGIN
CHANGELOG_END

* Update infra/macos/m1-build/init-2.sh

Co-authored-by: Gary Verhaegen <gary.verhaegen@digitalasset.com>

Co-authored-by: Gary Verhaegen <gary.verhaegen@digitalasset.com>
2022-08-17 07:57:22 -04:00
Gary Verhaegen
8ca7af3030
onboarding: add Chun Lok to release rotation (#14593)
And to our infrastructure's notion of "the ledger clients team"
(obviously abbreviated to "appr").

CHANGELOG_BEGIN
CHANGELOG_END
2022-08-03 12:05:28 +02:00
Gary Verhaegen
bf64e705e9
ci: unpin Docker (#14464)
When we set this up years ago (#1566), it was a way to get a more recent
Docker version than was then available through the default Ubuntu 16.04
apt repository. Nowadays, this actually makes us lag behind, to the
point where the 2.3.1 image isn't building.

CHANGELOG_BEGIN
CHANGELOG_END
2022-07-18 17:52:28 +00:00
Gary Verhaegen
9aec01853b
ci: unpin Ubuntu image (#14143)
Cleaning up after #14126

CHANGELOG_BEGIN
CHANGELOG_END
2022-06-22 13:56:05 +02:00
Gary Verhaegen
feb53f96c1
infra: tighten TLS security (#14239)
This tightens our TLS configuration a bit, mostly by dropping support
for SSL3, TLS1.0 and TLS1.1 on https://hoogle.daml.com,
https://bazel-cache.da-ext.net, https://nix-cache.da-ext.net and the
daml-binaries front (which I don't think we still use).

CHANGELOG_BEGIN
CHANGELOG_END
2022-06-21 14:37:24 +00:00
Gary Verhaegen
d81b4a7071
ci: pin Linux image to yesterday's because today's is broken (#14126)
Running the `docker` command on today's Ubuntu images crashes the
kernel. (Which is super reassuring from a security pov.)

CHANGELOG_BEGIN
CHANGELOG_END
2022-06-08 14:08:23 +00:00
Gary Verhaegen
eefe285f67
remove Stewart from release rotation (#14018)
CHANGELOG_BEGIN
CHANGELOG_END
2022-05-30 17:09:40 +00:00
Gary Verhaegen
dfa648f585
hunt down DAML better (#13195)
Process:

- `git ls-files -z | xargs -0 -n 100 sed -i --follow-symlinks 's/DAML/Daml/g'`
- `git add -p`
- `git restore -p`
- Check there is no unstaged change left.

To review:

- Check for false positives by carefully reviewing the diff in this PR.
- Check for false negatives with `git grep DAML`.
- Quicker check for fals positives:

```
git grep DAML | grep -v migration | grep -v DAML_
```

Fixes #13190

Note: This is the "second half" of #13191, which failed to cover all the
remaining DAMLs because of:

```
$ git ls-files | grep "'"
compiler/damlc/tests/daml-test-files/MangledScenario'.daml
```

CHANGELOG_BEGIN
CHANGELOG_END
2022-03-08 17:04:58 +01:00
Gary Verhaegen
53557dd7de
shut down ElasticSearch (#13151)
The cluster shuts down about once every two weeks and takes a couple
hous to get back up. It's been off for a few days right now and as far
as I'm aware nobody noticed.

My personal assessment is that this is costing us more in maintenance
(not to mention running) costs than what we're getting out of it.

CHANGELOG_BEGIN
CHANGELOG_END
2022-03-04 17:14:15 +01:00
Gary Verhaegen
091a5ac752
appr: add Stewart (#13116)
CHANGELOG_BEGIN
CHANGELOG_END
2022-03-01 23:11:54 +00:00
Gary Verhaegen
fe9d44ffe7
ci: bump Nix on macOS nodes (#13061)
However that happened, we were stuck with Nix 2.3.15 (or 2.3.16 in some
cases) on our macOS nodes. This PR is a minor edition to the Nix
initialization commands to switch from 2.4 to "latest", but I wil lalso
use it to record the changes I just did manually to the cluster.

The cluster is currently composed of two parts:
- 7 machines running Catalina (10.15.7).
- 1 machine running Monterey (12.2).

Unfortunately they use different setup. The Catalina ones are described
by the state of the repo (in theory, though keeping them in sync is
manual); in order to update those, I have:

1. Taken one node off the CI pool (`builder1epjj7`).
2. On that node, run the following commands:
   ```
   cd ~/daml/infra/macos/3-running-box
   vagrant destroy -f
   rm ~/images/*
   vagrant box remove macinbox
   vagrant box remove azure-ci-node
   rm -r ~/.vagrant.d/boxes/macinbox-06032020.tar.gz
   softwareupdate -d --fetch-full-installer --full-installer-version 10.15.7
   cd ~/daml/infra/macos/1-create-box
   sudo macinbox --box-format vmware_desktop --disk 250 --memory 32768 --cpu 10 --user-script user-script.sh
   cd ../2-common-box
   vagrant up
   vagrant package --output ~/images/initialized-$(date +%Y%m%d).box
   vagrant destroy -f
   cd
   ./run-agent.sh
   ```
   This leaves us with that node running an updated box. The new box is
   in `~/images/initialized-$(date)`.
3. Send that file to all the other nodes with `scp`.
4. Reboot all the nodes (after deactivating & waiting for jobs to
   finish).

For the Monterey node, images (steps 1 and 2 in this repo) are currently
created by @nycnewman on another machine I don't have access to, so I
took a slightly different approach: I took the existing image, started
it from the `3-running-box` folder as usual, manually updated Nix there,
then repackaged that.

CHANGELOG_BEGIN
CHANGELOG_END
2022-02-24 01:04:28 +00:00
Gary Verhaegen
583cad5fd6
Fix tf (#13028)
Goals:

- Reflect manual changes from #12996 in Terraform.
- Reflect manual changes from #12997 in Terraform.
- Update plugins to wirk with #12926.
- Keep running services working through the changes.

Details in commits.

CHANGELOG_BEGIN
CHANGELOG_END
2022-02-22 18:33:21 +00:00