Commit Graph

20 Commits

Author SHA1 Message Date
Gary Verhaegen
2923048935
remove purge_old_agents (#6439)
This script was supposed to remove old agents from the Azure Pipelines
UI. It may have been useful at some time (notably, when we used
ephemeral instances, they did not necessarily get to run their shutdown
script), but as it stands now, it's broken. The output from that step
ends in:

```
error: 2 derivations need to be built, but neither local builds ('--max-jobs') nor remote builds ('--builders') are enabled
```

after listing the nix packages it would build. Furthermore, it does not
seem to be useful as I have not seen any spurious entry in the agents
list on Azure since we switched to permanent nodes, on either the Linux
or Windows side (and this would only run on Linux, if it ran).

I'm also not convinced it ever ran, as I used to see a lot of spurious
machines on both Linux and Windows when we did use ephemeral instances.

CHANGELOG_BEGIN
CHANGELOG_END
2020-06-20 17:37:24 +02:00
Gary Verhaegen
d01715bf2f
add redirect to nix curl (linux) (#6407)
This is the second PR in the plan outlined in #6405. I have already
disabled the old nodes so no new job will get started there; I will,
however, wait until I've seen a few successful builds on the new nodes
before pulling the plug.

CHANGELOG_BEGIN
CHANGELOG_END
2020-06-18 14:08:21 +02:00
Gary Verhaegen
fba57470a5
restore terraform to working state (#6402)
It looks like some nix update has broken our current Terraform setup.
The Google provider plugin has changed its reported version to 0.0.0;
poking at my local nix store seems to indicate we actually get 3.15, but
🤷.

This PR also reverts the infra part of #6400 so we get back to master ==
reality.

CHANGELOG_BEGIN
CHANGELOG_END
2020-06-18 12:15:27 +02:00
Moritz Kiefer
2c1d4cb805
Fix nix installation (#6400)
Nix now requires -L, I’ve gone ahead and just normalized everything to
use -sfL which we were already using in one place.

changelog_begin
changelog_end
2020-06-18 10:34:08 +02:00
Gary Verhaegen
4a6ab84b69
add default machine capability (#5912)
add default machine capability

We semi-regularly need to do work that has the potential to disrupt a
machine's local cache, rendering it broken for other streams of work.
This can include upgrading nix, upgrading Bazel, debugging caching
issues, or anything related to Windows.

Right now we do not have any good solution for these situations. We can
either not do those streams of work, or we can proceed with them and
just accept that all other builds may get affected depending on which
machine they get assigned to. Debugging broken nodes is particularly
tricky as we do not have any way to force a build to run on a given
node.

This PR aims at providing a better alternative by (ab)using an Azure
Pipelines feature called
[capabilities](https://docs.microsoft.com/en-us/azure/devops/pipelines/agents/agents?view=azure-devops&tabs=browser#capabilities).
The idea behind capabilities is that you assign a set of tags to a
machine, and then a job can express its
[demands](https://docs.microsoft.com/en-us/azure/devops/pipelines/process/demands?view=azure-devops&tabs=yaml),
i.e. specify a set of tags machines need to have in order to run it.

Support for this is fairly badly documented. We can gather from the
documentation that a job can specify two things about a capability
(through its `demands`): that a given tag exists, and that a given tag
has an exact specified value. In particular, a job cannot specify that a
capability should _not_ be present, meaning we cannot rely on, say,
adding a "broken" tag to broken machines.

Documentation on how to set capabilities for an agent is basically
nonexistent, but [looking at the
code](https://github.com/microsoft/azure-pipelines-agent/blob/master/src/Microsoft.VisualStudio.Services.Agent/Capabilities/UserCapabilitiesProvider.cs)
indicates that they can be set by using a simple `key=value`-formatted
text file, provided we can find the right place to put this file.

This PR adds this file to our Linux, macOS and Windows node init scripts
to define an `assignment` capability and adds a demand for a `default`
value on each job. From then on, when we hit a case where we want a PR
to run on a specific node, and to prevent other PRs from running on that
node, we can manually override the capability from the Azure UI and
update the demand in the relevant YAML file in the PR.

CHANGELOG_BEGIN
CHANGELOG_END
2020-05-09 18:21:42 +02:00
Gary Verhaegen
43def51fce
add puppeteer dependencies to Linux nodes (#5575)
See #5540 for context.

CHANGELOG_BEGIN
CHANGELOG_END
2020-04-17 01:32:25 +02:00
Gary Verhaegen
1872c668a5
replace DAML Authors with DA in copyright headers (#5228)
Change requested by Manoj.

CHANGELOG_BEGIN
CHANGELOG_END
2020-03-27 01:26:10 +01:00
Gary Verhaegen
878429e3bf
update copyright notices to 2020 (#3939)
copyright update 2020

* update template
* run script: `dade-copyright-headers update .`
* update script
* manual adjustments
* exclude frozen proto files from further header checks (by adding NO_AUTO_COPYRIGHT files)
2020-01-02 21:21:13 +01:00
Gary Verhaegen
99ea93168d
update copyright notices (#2499) 2019-08-13 17:23:03 +01:00
Gary Verhaegen
bf5995f529
remove mentions of da-int servers (#2485) 2019-08-12 10:42:41 +01:00
Bolek@DigitalAsset
1a62841616 infra: add docker daemon to ci agent (#1566)
* installs docker and adds vsts user to docker group
2019-06-08 22:31:55 +00:00
Gary Verhaegen
4120ef2d1b [linux/ci] fix logging agent (#1356)
There are two issues with the current setup:

- iptables entry prevents connecting to the metadata server, and
- machines are given insufficient permissions.
2019-05-30 15:36:57 +00:00
Gary Verhaegen
ac719e7927 [ci/linux] keep daml copy until it's actually not needed anymore (#1349)
The existing script is deleting the daml directory too early, leading to
the "shutdown agents" step failing.
2019-05-23 15:25:37 +00:00
Gary Verhaegen
be2457cc6a [ci/linux] restart fluentd after installing (#1290)
It looks like the curl command is currently installing but not starting the service that is supposed to send logs to StackDriver. When connecting to the machines manually, a call to `restart` seems to fix it.
2019-05-21 21:37:51 +00:00
Brian Hansen
f9bb85a5a7 remove -O option from curl command in order to pipe script contents t… (#953)
* remove -O option from curl command in order to pipe script contents to bash

* follow redirects for stackdriver

Co-Authored-By: Moritz Kiefer <moritz.kiefer@purelyfunctional.org>
2019-05-15 18:33:01 +00:00
Gary Verhaegen
e95575b033 install StackDriver on build machines (#905)
Requested by Security
2019-05-04 22:55:51 +00:00
Jonas Chevalier
16aba583ce
CI linux agent changes (#509)
* ci: always use the linux-pool

reduce the difference of environment between external and internal
contributions

* infra: tweak the linux cache warmup script

Don't share the same bazel cache directory with the disk cache, which is
something else. Be more specific about the target. Clean after yourself.

* infra: bump the linux agent disk to 200GB

avoid running out of disk space
2019-04-16 11:35:46 +02:00
Florian Klink
5f75e9d1a0 infra/vsts_agent_linux_startup.sh: warm up local caches, purge old agents (#438)
Warm up local caches by building dev-env and current daml master This is
allowed to fail, as we still want to have CI machines around, even when
their caches are only warmed up halfway.

Afterwards, we purge old agents that might still be around, that didn't
unregister themselves

This depends on #402 to be merged, as otherwise purge_old_agents.py
can't be found obviously.
2019-04-12 16:47:36 +02:00
Jonas Chevalier
6f90fda6d1
infra: VSTS agent improvements (#369)
* infra: replace the debian image by ubuntu 16.04

be closer to what the azure vmImage is using

* infra: limit access to the PAT token
2019-04-11 17:11:14 +02:00
zimbatm
430a85649c add more Azure Pipeline agents (#230)
* nix: add the more providers to terraform
* docs: make tarballs more reproducible
* ci: use the linux-pool pool
* ci: tweak the nix installation

handle the case where the user is root and on ubuntu

* infra: terraform fmt

* infra: add Azure Pipeline agents

* ci: only enable linux-pool for internal PRs
2019-04-09 18:59:37 +02:00