ci/linux: kill machines if they fail to clean up (#8835)

It does not seem like CI machines recover from a failed clean-up. This
is not the most elegant solution possible, but it's a cheap one that
should work.

Not: shutting down the machine in the middle of the build will not
provide an error message to Slack for main branch builds (because the
`tell_slack_failed` step would need to run on the same machine) but will
correctly report failure for PRs (that was the original purpose of the
`collect_build_data` step).

An alternative here would be to give a delay to the shutdown command,
and try to calibrate it so that it's long enough for this job to
correctly report its failure to both Azure and Slack, while making it
short enough that no other job gets assigned to the machine. I'm not
clear enough on how often Azure assigns jobs to try and bet on that.

CHANGELOG_BEGIN
CHANGELOG_END
This commit is contained in:
Gary Verhaegen 2021-02-12 19:32:14 +01:00 committed by GitHub
parent 638a1ffeb1
commit df0086d26f
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -8,6 +8,12 @@ steps:
# infra/macos/2-common-box/init.sh:echo "build:darwin --disk_cache=~/.bazel-cache" > ~/.bazelrc # infra/macos/2-common-box/init.sh:echo "build:darwin --disk_cache=~/.bazel-cache" > ~/.bazelrc
# infra/vsts_agent_linux_startup.sh:echo "build:linux --disk_cache=~/.bazel-cache" > ~/.bazelrc # infra/vsts_agent_linux_startup.sh:echo "build:linux --disk_cache=~/.bazel-cache" > ~/.bazelrc
# Linux machines don't seem to recover when this script fails, and they get
# renewed by the instance_group
if [ "$(uname -s)" == "Linux" ]; then
trap "shutdown -h now" EXIT
fi
if [ $(df -m . | sed 1d | awk '{print $4}') -lt 50000 ]; then if [ $(df -m . | sed 1d | awk '{print $4}') -lt 50000 ]; then
echo "Disk full, cleaning up..." echo "Disk full, cleaning up..."
disk_cache="$HOME/.bazel-cache" disk_cache="$HOME/.bazel-cache"
@ -24,4 +30,5 @@ steps:
fi fi
fi fi
df -h . df -h .
trap - EXIT
displayName: clean-up disk cache displayName: clean-up disk cache