daml/infra/macos
Gary Verhaegen 6aac32480a
hopefully fix memory issue with pg on macos CI (#5824)
We have seen the following error message crop up a couple times
recently:

```
FATAL:  could not create shared memory segment: No space left on device
DETAIL:  Failed system call was shmget(key=5432001, size=56, 03600).
HINT:  This error does *not* mean that you have run out of disk space.
It occurs either if all available shared memory IDs have been taken, in
which case you need to raise the SHMMNI parameter in your kernel, or
because the system's overall limit for shared memory has been reached.
    The PostgreSQL documentation contains more information about shared
memory configuration.
child process exited with exit code 1
```

Based on [the PostgreSQL
documentation](https://www.postgresql.org/docs/12/kernel-resources.html),
this should fix it.

CHANGELOG_BEGIN
CHANGELOG_END
2020-05-04 14:32:23 -04:00
..
1-create-box hopefully fix memory issue with pg on macos CI (#5824) 2020-05-04 14:32:23 -04:00
2-vagrant-files Minor changes to MacOS infra config (#5673) 2020-04-22 18:57:40 +02:00
.ruby-version Macos boxes for ci (#5002) 2020-04-14 18:03:24 +02:00
README.md Macos boxes for ci (#5002) 2020-04-14 18:03:24 +02:00

Introduction

While our Linux and Windows machines are using standard cloud infrastructure, and as such can be created entirely from Terraform scripts, our macOS nodes must be managed on a more physical level. This folder contains all the instructions needed to create a virtual macOS machine (running on a physical macOS host) that can be added to our Azure pool.

There are a few pieces to this puzzle:

  1. Instructions to create a base Vagrant box. This only needs to be done once per Apple-supplied macOS installer version; the resulting base box can be shared to new machines by simply copying a folder.
  2. Vagrantfile and init script; this is the piece that will, from the box defined above, start up a brand new macOS VM, install everything we need, run a complete build of the project, and then connect to Azure and wait for CI build requests.
  3. Additional considerations, discussed below.

Security considerations

The guest machine is created with a user, vagrant, that has passwordless sudo access and can be accessed with the default, well-known Vagrant SSH "private" key. While this is useful for debugging, it is crucial that the SSH port of the guest machine MUST NOT be accessible from outside the host machine, and that access to the host machine itself be appropriately restricted.

My personal recommendation would be for the host machines to not be accessible from any network, and to instead be managed by physical access, if possible.

The init.sh script creates a vsts user with more restricted access to run the CI builds.

Machine initialization

The Vagrantfile for CI nodes will read the init script directly from the master branch on GitHub. This means that any change to the init script for the macOS machines constitutes a "Standard Change", just like changes to the Linux and Windows scripts. Note that this should already be enforced by virtue of the init.sh file being under the already-monitored-by-CI folder //infra.

The intention is that the macOS nodes, like the Linux and Windows ones, should be cycled daily, so changing their config file can be done with no other human intervention than committing to the master branch. This means that, without additional human intervention, changes would take about a day to propagate, whereas the DAML team can apply changes to the Linux and Windows nodes directly through Terraform. I believe this tradeoff is necessary, because we should restrict access to the underlying macOS hosts as much as practical. Ideally, they should not be reachable from the internet at all.

Wiping nodes on a cron

Just like the Linux and Windows nodes, the macOS nodes should be wiped daily. This is the main reason for putting in the effort in virtualizing them, as opposed to just setting up physical nodes directly.

As explained in the Vagrant section of this document, this should be as simple as adding a cron to run

cd path/to/Vagrantfile/folder && vagrant destroy -f && GUEST_NAME=... VSTS_TOKEN=... vagrant up

every day at 4AM UTC (to synchronize with the Linux and Windows ones).

Proxying the cache

It is likely (though not certain) that, at some point, we will want to reduce the amount of traffic generated between our macOS CI nodes and the GCP-hosted caches, both for performance and price reasons. Under VirtualBox, guest VMs by default use their host machines as the default gateway, so this should be feasible through standard HTTP proxying.

However, I have not yet spent much time investigating this.

Other virtualization techniques

While this folder suggests one known-to-work way to get CI nodes, there are alternative options. I have not spent much time exploring other virtualization options or other ways to create an initial "blank" macOS virtual hard drive, though I believe the provided init.sh script should work with most other approaches with minimal changes, if required.