Commit Graph

39 Commits

Author SHA1 Message Date
Gary Verhaegen
0a251b3fa5
switch CI nodes to permanent (#4455)
CHANGELOG_BEGIN
CHANGELOG_END
2020-02-11 02:07:42 +01:00
Gary Verhaegen
1681922f90
ci: temp machines for scheduled killing experiment (#4386)
* ci: temp machines for scheduled killing experiment

Based on our discussions last week, I am exploring ways to move us to
permanent machines instead of preemptible ones. This should drastically
reduce the number of "cancelled" jobs.

The end goal is to have:

1. An instance group (per OS) that defines the actual CI nodes; this
would be pretty much the same as the existing ones, but with
`preemptible` set to false.
2. A separate machine that, on a cron (say at 4AM UTC), destroys all the
CI nodes.

The hope is that the group managers, which are set to maintain 10 nodes,
will then recreate the "missing" nodes using their normal starting
procedure.

However, there are a lot of unknowns I would like to explore, and I need
a playground for that. This is where this PR comes in. As it stands, it
creates one "killer" machine and a temporary group manager. I will use
these to experiment with the GCP API in various ways without interfering
with the real CI nodes.

This experimentation will likely require multiple `terraform apply` with
multiple different versions of the associated files, as well as
connecting to the machines and running various commands directly from
them. I will ensure all of that only affects the new machines created as
part of this PR, and therefore believe we do not need to go through a
separate round of approval for each change.

Once I have finished experimenting, I will create a new PR to clean up
the temporary resources created with this one and hopefully set up a
more permanent solution.

CHANGELOG_BEGIN
CHANGELOG_END

* add missing zone for killer instance

* add compute scope to killer

* authorize Terraform to shutdown killer to update it

* change in plans: use a service account instead

* .

* add compute.instances.list permission

* add compute.instances.delete permission

* add cron script

* obligatory round of extra escaping

* fix PATH issue & crontab format

* smaller machine & less frequent reboots
2020-02-07 21:04:03 +01:00
Gary Verhaegen
852fc7cd1a
remove temp debug ci nodes (#4373)
Following the happy resolution of #4370 in #4371, we do not need the
temporary nodes anymore. This PR therefore removes them.

CHANGELOG_BEGIN
CHANGELOG_END
2020-02-04 15:54:03 +01:00
Gary Verhaegen
5606ab350c
fix Windows CI node startup script (#4371)
This is an attempt to apply a potential fix discovered as part of the
investigation in #4370. The issue seems to be that Chocolatey is using a
protocol deemed not secure enough and disabled in recent Windows images
(our node creation script dynamically selects the lmatest "Windows 2016"
server image from GCP).

CHANGELOG_BEGIN
CHANGELOG_END
2020-02-04 14:37:53 +01:00
Gary Verhaegen
48f39beda2
add Windows debug machine (#4370)
Today we don't have any Windows machine in the CI pool. The machine
template has not changed since 2019-11-21, yet as of today when the
machine starts GCP proudly declares

> GCEMetadataScripts: No startup scripts to run.

despite the script being defined as `sysprep-specialize-script-ps1`, as
per the
[documentation](https://cloud.google.com/compute/docs/startupscript).
Also, it used to work and we haven't changed anything.

I'm not quite sure what's going on and how to investigate, but I think
at the very least we can try to unblock the team by having a set of
machines we initialize manually. This PR is meant to do that.)

This is the same changeset as a877491139
and 16da700532, except that it now
specifies 5 machines instead of just one.

CHANGELOG_BEGIN
CHANGELOG_END
2020-02-04 14:30:56 +01:00
Gary Verhaegen
6233f66ff6
remove debug Windows machine (#4267)
CHANGELOG_BEGIN
CHANGELOG_END
2020-01-29 18:07:53 +01:00
Gary Verhaegen
16da700532
temporary Windows machins for Andreas (#4165)
The recent changes to the way in which we build npm packages with Bazel
have caused a lot of issues on Windows. To debug those, Andreas has
requested a temporary machine.

This is pretty much an exact replica of #3294 (a87749113), with the same
plan:

1. I run terraform apply on this PR is merged.
2. I manually, through the GCP web console, set a dummy password for that
  machine's RDP connection and transmit that to @aherrmann-da through
  Slack.
3. @aherrmann-da debugs the issue.
4. I create a PR to roll back this one, then apply it once it's merged.

Note: I have verified that master applies cleanly prior to opening this
PR.

CHANGELOG_BEGIN
CHANGELOG_END
2020-01-22 19:10:01 +01:00
Gary Verhaegen
878429e3bf
update copyright notices to 2020 (#3939)
copyright update 2020

* update template
* run script: `dade-copyright-headers update .`
* update script
* manual adjustments
* exclude frozen proto files from further header checks (by adding NO_AUTO_COPYRIGHT files)
2020-01-02 21:21:13 +01:00
Gary Verhaegen
07074a4759
remove Windows debug machine (#3451) 2019-11-13 18:33:15 +01:00
Gary Verhaegen
d4c38a3763
add gcs bucket for ledger dumps (#3374) 2019-11-07 14:41:15 +00:00
Gary Verhaegen
62dcbd86b5 pin hoogle version to avoid surprises (#3322) 2019-11-05 18:14:29 +00:00
Gary Verhaegen
a877491139
temporary Windows CI instance for debugging (#3294)
Create a temporary CI machine that looks just like the real ones specifically for debugging.
2019-11-04 11:52:27 +01:00
Gary Verhaegen
13e6f581e3
fix hoogle; revert cache buckets ACL changes (#3062) 2019-09-27 15:42:31 +01:00
Gary Verhaegen
99ea93168d
update copyright notices (#2499) 2019-08-13 17:23:03 +01:00
Gary Verhaegen
bf5995f529
remove mentions of da-int servers (#2485) 2019-08-12 10:42:41 +01:00
Florian Klink
14ecfd7bae infra: add acls for google_storage_objects create via tf (#2460)
This ensures objects in the google storage bucket created by terraform
have the proper publicRead acl.
2019-08-08 19:13:15 +02:00
Gary Verhaegen
36070476c3 collect historical download data (#2003) 2019-07-04 11:23:51 +00:00
Florian Klink
1cd5bb2492 infra: move index.html outside gcp_cdn_bucket module (#1716)
* infra: gcp_cdn_bucket: update comment

The cache retention can be configured, while the comment suggests its
hardcoded.

* infra: don't create index.html inside gcp_cdn_bucket module

We might want to add a different index.html per bucket, so move that
code outside the module and into the bucket-specific terraform files.

Also add bucket-specific index.html files.
2019-07-02 11:14:21 +01:00
Gary Verhaegen
a1424d3446 add authealing to hoogle cluster (#1906) 2019-06-27 05:46:01 +00:00
Gary Verhaegen
18aee24e0f fix hoogle cron escaping (#1902) 2019-06-26 18:42:23 +00:00
Gary Verhaegen
31171ec6b6 terraform files for hoogle server (#1660) 2019-06-22 00:15:52 +00:00
Bolek@DigitalAsset
1a62841616 infra: add docker daemon to ci agent (#1566)
* installs docker and adds vsts user to docker group
2019-06-08 22:31:55 +00:00
Gary Verhaegen
4120ef2d1b [linux/ci] fix logging agent (#1356)
There are two issues with the current setup:

- iptables entry prevents connecting to the metadata server, and
- machines are given insufficient permissions.
2019-05-30 15:36:57 +00:00
Gary Verhaegen
ac719e7927 [ci/linux] keep daml copy until it's actually not needed anymore (#1349)
The existing script is deleting the daml directory too early, leading to
the "shutdown agents" step failing.
2019-05-23 15:25:37 +00:00
Gary Verhaegen
c762d491ea target s3 bucket with docs refresh script (#1287)
There is no simple way to configure GCS to serve the desired security
headers, so instead the script will keep updating the existing s3
bucket.

Consequent changes:

- Add aws cli tool to dev-env
- Remove docs bucket from Terraform
2019-05-21 22:26:07 +00:00
Gary Verhaegen
be2457cc6a [ci/linux] restart fluentd after installing (#1290)
It looks like the curl command is currently installing but not starting the service that is supposed to send logs to StackDriver. When connecting to the machines manually, a call to `restart` seems to fix it.
2019-05-21 21:37:51 +00:00
Moritz Kiefer
1cfa27d616
Install the Windows SDK on CI nodes (#1272)
This provides signtool.exe which we need to sign our Windows installer.
2019-05-21 13:42:49 +02:00
Brian Hansen
f9bb85a5a7 remove -O option from curl command in order to pipe script contents t… (#953)
* remove -O option from curl command in order to pipe script contents to bash

* follow redirects for stackdriver

Co-Authored-By: Moritz Kiefer <moritz.kiefer@purelyfunctional.org>
2019-05-15 18:33:01 +00:00
Gary Verhaegen
a244579470 set default page for docs (#1102)
This mirrors the current behaviour of docs.daml.com.
2019-05-13 22:34:21 +00:00
Gary Verhaegen
5ab5ced2e3 add GCS bucket for docs (#1062)
This is a first step towards improving our docs release process. The
goal here is to get rid of the manual "publish docs" step. This is done
as a periodic check because we only want to run this for "published"
releases, i.e. the ones that are not marked as prerelease. Because the
act of publishing a release is a manual step that Azure cannot trigger
on, we instead opt for a periodic check.

Not included in this piece of work:
- Any change to the docs themselves; the goal here is to automate the
current process as a first step. Future plans for the docs themselves
include adding links to older versions of the docs.
- A better way to detect docs are already up-to-date, and abort if so.
- Including older versions of the docs.
- Switching the DNS record from the current AWS S3 bucket to this new
GCS bucket. That will be a manual step once we're happy with how the
new bucket works.
2019-05-11 03:27:17 +00:00
Gary Verhaegen
e95575b033 install StackDriver on build machines (#905)
Requested by Security
2019-05-04 22:55:51 +00:00
Florian Klink
56c322c982 infra: add some docs / comments (#796)
* infra: document google_storage_bucket_iam_member resources

* infra: document nix-cache-info file

* infra: document who's maintaining the DA ext certificate

* infra: README: mention azure pipeline agents

* infra: README: IT -> DA IT
2019-05-01 15:54:09 +00:00
Jonas Chevalier
769c04d3ba infra: reduce differences with hosted (#698) 2019-04-25 20:49:38 +00:00
Jonas Chevalier
3b8ae1ff86 infra: add a VSTS windows agents (#368) 2019-04-18 11:20:57 +00:00
Jonas Chevalier
16aba583ce
CI linux agent changes (#509)
* ci: always use the linux-pool

reduce the difference of environment between external and internal
contributions

* infra: tweak the linux cache warmup script

Don't share the same bazel cache directory with the disk cache, which is
something else. Be more specific about the target. Clean after yourself.

* infra: bump the linux agent disk to 200GB

avoid running out of disk space
2019-04-16 11:35:46 +02:00
Florian Klink
5f75e9d1a0 infra/vsts_agent_linux_startup.sh: warm up local caches, purge old agents (#438)
Warm up local caches by building dev-env and current daml master This is
allowed to fail, as we still want to have CI machines around, even when
their caches are only warmed up halfway.

Afterwards, we purge old agents that might still be around, that didn't
unregister themselves

This depends on #402 to be merged, as otherwise purge_old_agents.py
can't be found obviously.
2019-04-12 16:47:36 +02:00
Jonas Chevalier
6f90fda6d1
infra: VSTS agent improvements (#369)
* infra: replace the debian image by ubuntu 16.04

be closer to what the azure vmImage is using

* infra: limit access to the PAT token
2019-04-11 17:11:14 +02:00
zimbatm
430a85649c add more Azure Pipeline agents (#230)
* nix: add the more providers to terraform
* docs: make tarballs more reproducible
* ci: use the linux-pool pool
* ci: tweak the nix installation

handle the case where the user is root and on ubuntu

* infra: terraform fmt

* infra: add Azure Pipeline agents

* ci: only enable linux-pool for internal PRs
2019-04-09 18:59:37 +02:00
Digital Asset GmbH
05e691f558 open-sourcing daml 2019-04-04 09:33:38 +01:00