macOS filesystems have been case-insensitive by default for years, and
in particular our laptops are, so if we want the cache to work as
expected, CI should be too.
Note: this does not apply to Nix, because the Nix partition is a
case-sensitive image per @nycnewman's script on laptops too.
CHANGELOG_BEGIN
CHANGELOG_END
We've recently seen a few cases where the macOS nodes ended up not
having the cache partition mounted. So far this has only happened on
semi-broken nodes (guest VM still up and running but host unable to
connect to it), so I haven't been able to actually poke at a broken
machine, but I believe this should allow a machine in such a state to
recover.
While we haven't observed a similar issue on Linux nodes (as far as I'm
aware), I have made similar changes there to keep both scripts in sync.
CHANGELOG_BEGIN
CHANGELOG_END
This is adapting the same approach as #9137 to the macOS machines. The
setup is very similar, except macOS apparently doesn't require any kind
of `sudo` access in the process.
The main reason for the change here is that while `~/.bazel-cache` is
reasonably fast to clean, cleaning just that has finally caught up to us
with a recent cleanup step that proudly claimed:
```
before: 638Mi free
after: 1.2Gi free
```
So we do need to start cleaning the other one after all.
CHANGELOG_BEGIN
CHANGELOG_END
As we strive for more inclusiveness, we are becoming less comfortable
with historically-charged terms being used in our everyday work.
This is targeted for merge on Dec 26, _after_ the necessary
corresponding changes at both the GitHub and Azure Pipelines levels.
CHANGELOG_BEGIN
- DAML Connect development is now conducted from the `main` branch,
rather than the `master` one. If you had any dependency on the
digital-asset/daml repository, you will need to update this parameter.
CHANGELOG_END
This is far from perfect but removes the blatantly wrong sections of the
README.
Note: as a README change, this is not really a standard change, but
because the README is under the infra folder, this PR does need the tag
to pass CI.
CHANGELOG_BEGIN
CHANGELOG_END
This is the macOS part of #5912, which I have separated because our
macOS nodes have a different deployment process so it seemed easier to
track the deployment of the change separately.
CHANGELOG_BEGIN
CHANGELOG_END
multistep macos setup
This updates the macOS node setup instructions to avoid repeating
identical work and network traffic across all machines through
initialization by building a "daily" image with all the tools and code
we need.
CHANGELOG_BEGIN
CHANGELOG_END
* Fix 3-running-box to remount nix partition
* updated scripts to use multi-step process
* add copyright notices
Co-authored-by: nycnewman <edward@digitalasset.com>
See #6400; split out as separate PR so master == reality and we can
track when this is done. @nycnewman please merge this once the change
is deployed.
Note: it has to be deployed before the next restart; nodes will _not_ be
able to boot with the current configuration.
CHANGELOG_BEGIN
CHANGELOG_END
It looks like some nix update has broken our current Terraform setup.
The Google provider plugin has changed its reported version to 0.0.0;
poking at my local nix store seems to indicate we actually get 3.15, but
🤷.
This PR also reverts the infra part of #6400 so we get back to master ==
reality.
CHANGELOG_BEGIN
CHANGELOG_END
Nix now requires -L, I’ve gone ahead and just normalized everything to
use -sfL which we were already using in one place.
changelog_begin
changelog_end
* Fix alunchd killing VMWare process at end of script execution
* Fix alunchd killing VMWare process at end of script execution
CHANGELOG_BEGIN
Fix issue with MacOS Catalina Launchd killing VMWare instance on rebuild (AbandonProcessGrop)
CHANGELOG_END
* Updates to support VMWare vairant of Hypervisor
* Update infra/macos/scripts/rebuild-crontask.sh
Co-authored-by: Gary Verhaegen <gary.verhaegen@digitalasset.com>
* Update infra/macos/scripts/run-agent.sh
Co-authored-by: Gary Verhaegen <gary.verhaegen@digitalasset.com>
Co-authored-by: Gary Verhaegen <gary.verhaegen@digitalasset.com>
We have seen the following error message crop up a couple times
recently:
```
FATAL: could not create shared memory segment: No space left on device
DETAIL: Failed system call was shmget(key=5432001, size=56, 03600).
HINT: This error does *not* mean that you have run out of disk space.
It occurs either if all available shared memory IDs have been taken, in
which case you need to raise the SHMMNI parameter in your kernel, or
because the system's overall limit for shared memory has been reached.
The PostgreSQL documentation contains more information about shared
memory configuration.
child process exited with exit code 1
```
Based on [the PostgreSQL
documentation](https://www.postgresql.org/docs/12/kernel-resources.html),
this should fix it.
CHANGELOG_BEGIN
CHANGELOG_END
set up macOS nodes
This PR documents how to create and manage macOS CI nodes. Because macOS
is not supported by our current cloud providers, these instructions are
geared towards creating VMs on physical machines we would need to host
and manage ourselves, i.e. these notes are mostly targeted at Ed.
CHANGELOG_BEGIN
CHANGELOG_END