digital-asset/daml - daml - gitea: Gitea Service

mirror of https://github.com/digital-asset/daml.git synced 2024-09-20 01:07:18 +03:00

Author	SHA1	Message	Date
Gary Verhaegen	819210827e	fix permissions on periodic-killer (#5307 ) Even though the command succeeds as far as deleting the machine goes, it does log an error. That is probably why we recently had only one machine deleted per night. Something must have changed on the Google side recently to make this additional permission required. CHANGELOG_BEGIN CHANGELOG_END	2020-03-31 19:04:40 +02:00
Gary Verhaegen	38a5fea7a0	tweak periodic-killer (#5268 ) 1. Google says the instance is currently overutilized and suggests g1-small as a more appropriate size. 2. It occurred to me that the reason no error was logged might be that we lose them, so explicitly redirecting stderr too. CHANGELOG_BEGIN CHANGELOG_END	2020-03-30 14:12:14 +02:00
Gary Verhaegen	7e960eb454	log periodic reboots (#5235 ) It appears that most of our Windows machines have not been rebooted since Tuesday 24. We detected this because one of them has run out of disk space. This is not good, but what's worse is I currently have no idea what could be going wrong, and we are not logging anything at all in the current setup, so even ssh'ing into the machine provides no insight. This PR hopefully addresses that by: 1. Redirecting the outputs of the script to a file, and 2. `tail`iing that file from the startup script, so the logs will appear directly in the GCP web console. (This is what we currently do for the Azure agent logs on Linux.) This PR also tells the script to not stop on the first failed machine and keep trying. CHANGELOG_BEGIN CHANGELOG_END	2020-03-27 21:35:49 +01:00
Gary Verhaegen	1872c668a5	replace DAML Authors with DA in copyright headers (#5228 ) Change requested by Manoj. CHANGELOG_BEGIN CHANGELOG_END	2020-03-27 01:26:10 +01:00
Gary Verhaegen	0a251b3fa5	switch CI nodes to permanent (#4455 ) CHANGELOG_BEGIN CHANGELOG_END	2020-02-11 02:07:42 +01:00
Gary Verhaegen	1681922f90	ci: temp machines for scheduled killing experiment (#4386 ) * ci: temp machines for scheduled killing experiment Based on our discussions last week, I am exploring ways to move us to permanent machines instead of preemptible ones. This should drastically reduce the number of "cancelled" jobs. The end goal is to have: 1. An instance group (per OS) that defines the actual CI nodes; this would be pretty much the same as the existing ones, but with `preemptible` set to false. 2. A separate machine that, on a cron (say at 4AM UTC), destroys all the CI nodes. The hope is that the group managers, which are set to maintain 10 nodes, will then recreate the "missing" nodes using their normal starting procedure. However, there are a lot of unknowns I would like to explore, and I need a playground for that. This is where this PR comes in. As it stands, it creates one "killer" machine and a temporary group manager. I will use these to experiment with the GCP API in various ways without interfering with the real CI nodes. This experimentation will likely require multiple `terraform apply` with multiple different versions of the associated files, as well as connecting to the machines and running various commands directly from them. I will ensure all of that only affects the new machines created as part of this PR, and therefore believe we do not need to go through a separate round of approval for each change. Once I have finished experimenting, I will create a new PR to clean up the temporary resources created with this one and hopefully set up a more permanent solution. CHANGELOG_BEGIN CHANGELOG_END * add missing zone for killer instance * add compute scope to killer * authorize Terraform to shutdown killer to update it * change in plans: use a service account instead * . * add compute.instances.list permission * add compute.instances.delete permission * add cron script * obligatory round of extra escaping * fix PATH issue & crontab format * smaller machine & less frequent reboots	2020-02-07 21:04:03 +01:00

6 Commits