2021-01-01 21:49:51 +03:00
# Copyright (c) 2021 Digital Asset (Switzerland) GmbH and/or its affiliates. All rights reserved.
2020-04-23 11:36:20 +03:00
# SPDX-License-Identifier: Apache-2.0
# Do not run on PRs
pr : none
2020-12-27 16:19:07 +03:00
# Do not run on merge to main
2020-04-23 11:36:20 +03:00
trigger : none
# Do run on a schedule (daily)
#
# Note: machines are killed every day at 4AM UTC, so we need to either:
# - run sufficiently before that that this doesn't get killed, or
# - run sufficiently after that that machines are initialized.
#
# Targeting 6AM UTC seems to fit that.
schedules :
- cron : "0 6 * * *"
2020-05-06 14:50:35 +03:00
displayName : daily checks and reporting
2020-04-23 11:36:20 +03:00
branches :
include :
2020-12-27 16:19:07 +03:00
- main
2020-04-23 11:36:20 +03:00
always : true
jobs :
2020-05-13 11:39:51 +03:00
- job : compatibility_ts_libs
timeoutInMinutes : 60
pool :
name : linux-pool
demands : assignment -equals default
steps :
- checkout : self
- template : ../compatibility_ts_libs.yml
2020-06-03 17:36:05 +03:00
- template : ../daily_tell_slack.yml
2020-05-13 11:39:51 +03:00
2020-04-23 11:36:20 +03:00
- job : compatibility
2020-05-13 11:39:51 +03:00
dependsOn : compatibility_ts_libs
2020-07-10 15:34:53 +03:00
timeoutInMinutes : 720
2020-05-11 23:59:33 +03:00
strategy :
matrix :
linux :
pool : linux-pool
macos :
pool : macOS-pool
2020-04-23 11:36:20 +03:00
pool :
2020-05-11 23:59:33 +03:00
name : $(pool)
2020-11-25 17:34:33 +03:00
demands : assignment -equals default
2020-05-11 10:28:40 +03:00
steps :
- checkout : self
clear shared memory segment on macOS (#6530)
For a while now we've had errors along the line of
```
FATAL: could not create shared memory segment: No space left on device
DETAIL: Failed system call was shmget(key=5432001, size=56, 03600).
HINT: This error does *not* mean that you have run out of disk space.
It occurs either if all available shared memory IDs have been taken, in
which case you need to raise the SHMMNI parameter in your kernel, or
because the system's overall limit for shared memory has been reached.
The PostgreSQL documentation contains more information about
shared memory configuration.
child process exited with exit code 1
```
on macOS CI nodes, which we were not able to reproduce locally. Today I
managed to, sort of by accident, and that allowed me to dig a bit
further.
The root cause seems to be that PostgreSQL, as run by Bazel, does not
always seem to properly unlink the shared memory segment it uses to
communicate with itself. On my machine, running:
```
bazel test -t- --runs_per_test=100 //ledger/sandbox:conformance-test-wall-clock-postgresql
```
and eyealling the results of
```
watch ipcs -mcopt
```
I would say about one in three runs leaks its memory segment. After much
googling and some head scratching trying to figure out the C APIs for
managing shared memory segments on macOS, I kind of stumbled on a
reference to `pcirm` in a comment to some low-ranking StackOverflow
answer. It looks like it's working very well on my machine, even if I
run it while a test (and therefore an instance of pg) is running. I
believe this is because the command does not actually remove the shared
memory segments, but simply marks them for removal once the last process
stops using it. (At least that's what the manpage describes.)
CHANGELOG_BEGIN
CHANGELOG_END
2020-06-30 02:40:16 +03:00
- ${{ if eq(variables['pool'], 'macos-pool') }}:
2020-06-30 16:21:32 +03:00
- template : ../clear-shared-segments-macos.yml
2020-05-11 10:28:40 +03:00
- template : ../compatibility.yml
2020-06-03 17:36:05 +03:00
- template : ../daily_tell_slack.yml
2020-05-11 10:28:40 +03:00
2020-04-28 17:06:36 +03:00
- job : compatibility_windows
2020-05-13 11:39:51 +03:00
dependsOn : compatibility_ts_libs
2020-07-10 15:34:53 +03:00
timeoutInMinutes : 720
2020-04-28 17:06:36 +03:00
pool :
name : windows-pool
add default machine capability (#5912)
add default machine capability
We semi-regularly need to do work that has the potential to disrupt a
machine's local cache, rendering it broken for other streams of work.
This can include upgrading nix, upgrading Bazel, debugging caching
issues, or anything related to Windows.
Right now we do not have any good solution for these situations. We can
either not do those streams of work, or we can proceed with them and
just accept that all other builds may get affected depending on which
machine they get assigned to. Debugging broken nodes is particularly
tricky as we do not have any way to force a build to run on a given
node.
This PR aims at providing a better alternative by (ab)using an Azure
Pipelines feature called
[capabilities](https://docs.microsoft.com/en-us/azure/devops/pipelines/agents/agents?view=azure-devops&tabs=browser#capabilities).
The idea behind capabilities is that you assign a set of tags to a
machine, and then a job can express its
[demands](https://docs.microsoft.com/en-us/azure/devops/pipelines/process/demands?view=azure-devops&tabs=yaml),
i.e. specify a set of tags machines need to have in order to run it.
Support for this is fairly badly documented. We can gather from the
documentation that a job can specify two things about a capability
(through its `demands`): that a given tag exists, and that a given tag
has an exact specified value. In particular, a job cannot specify that a
capability should _not_ be present, meaning we cannot rely on, say,
adding a "broken" tag to broken machines.
Documentation on how to set capabilities for an agent is basically
nonexistent, but [looking at the
code](https://github.com/microsoft/azure-pipelines-agent/blob/master/src/Microsoft.VisualStudio.Services.Agent/Capabilities/UserCapabilitiesProvider.cs)
indicates that they can be set by using a simple `key=value`-formatted
text file, provided we can find the right place to put this file.
This PR adds this file to our Linux, macOS and Windows node init scripts
to define an `assignment` capability and adds a demand for a `default`
value on each job. From then on, when we hit a case where we want a PR
to run on a specific node, and to prevent other PRs from running on that
node, we can manually override the capability from the Azure UI and
update the demand in the relevant YAML file in the PR.
CHANGELOG_BEGIN
CHANGELOG_END
2020-05-09 19:21:42 +03:00
demands : assignment -equals default
2020-04-28 17:06:36 +03:00
steps :
- checkout : self
- template : ../compatibility-windows.yml
2020-05-05 13:23:11 +03:00
- task : PublishBuildArtifacts@1
condition : succeededOrFailed()
inputs :
pathtoPublish : '$(Build.StagingDirectory)'
artifactName : 'Bazel Compatibility Logs'
2020-06-03 17:36:05 +03:00
- template : ../daily_tell_slack.yml
2020-09-14 21:38:31 +03:00
- job : perf_speedy
2020-05-06 14:50:35 +03:00
timeoutInMinutes : 120
pool :
name : "linux-pool"
add default machine capability (#5912)
add default machine capability
We semi-regularly need to do work that has the potential to disrupt a
machine's local cache, rendering it broken for other streams of work.
This can include upgrading nix, upgrading Bazel, debugging caching
issues, or anything related to Windows.
Right now we do not have any good solution for these situations. We can
either not do those streams of work, or we can proceed with them and
just accept that all other builds may get affected depending on which
machine they get assigned to. Debugging broken nodes is particularly
tricky as we do not have any way to force a build to run on a given
node.
This PR aims at providing a better alternative by (ab)using an Azure
Pipelines feature called
[capabilities](https://docs.microsoft.com/en-us/azure/devops/pipelines/agents/agents?view=azure-devops&tabs=browser#capabilities).
The idea behind capabilities is that you assign a set of tags to a
machine, and then a job can express its
[demands](https://docs.microsoft.com/en-us/azure/devops/pipelines/process/demands?view=azure-devops&tabs=yaml),
i.e. specify a set of tags machines need to have in order to run it.
Support for this is fairly badly documented. We can gather from the
documentation that a job can specify two things about a capability
(through its `demands`): that a given tag exists, and that a given tag
has an exact specified value. In particular, a job cannot specify that a
capability should _not_ be present, meaning we cannot rely on, say,
adding a "broken" tag to broken machines.
Documentation on how to set capabilities for an agent is basically
nonexistent, but [looking at the
code](https://github.com/microsoft/azure-pipelines-agent/blob/master/src/Microsoft.VisualStudio.Services.Agent/Capabilities/UserCapabilitiesProvider.cs)
indicates that they can be set by using a simple `key=value`-formatted
text file, provided we can find the right place to put this file.
This PR adds this file to our Linux, macOS and Windows node init scripts
to define an `assignment` capability and adds a demand for a `default`
value on each job. From then on, when we hit a case where we want a PR
to run on a specific node, and to prevent other PRs from running on that
node, we can manually override the capability from the Azure UI and
update the demand in the relevant YAML file in the PR.
CHANGELOG_BEGIN
CHANGELOG_END
2020-05-09 19:21:42 +03:00
demands : assignment -equals default
2020-05-06 14:50:35 +03:00
steps :
- checkout : self
- bash : ci/dev-env-install.sh
displayName : 'Build/Install the Developer Environment'
- bash : ci/configure-bazel.sh
displayName : 'Configure Bazel for root workspace'
env :
IS_FORK : $(System.PullRequest.IsFork)
# to upload to the bazel cache
GOOGLE_APPLICATION_CREDENTIALS_CONTENT : $(GOOGLE_APPLICATION_CREDENTIALS_CONTENT)
2020-09-14 21:38:31 +03:00
- template : ../bash-lib.yml
parameters :
var_name : bash_lib
2020-05-06 14:50:35 +03:00
- bash : |
set -euo pipefail
eval "$(dev-env/bin/dade assist)"
2020-09-14 21:38:31 +03:00
source $(bash_lib)
2020-05-06 14:50:35 +03:00
BASELINE="cebc26af88efef4a7c81c62b0c14353f829b755e"
2020-05-07 14:53:22 +03:00
TEST_SHA=$(cat ci/cron/perf/test_sha)
2020-05-06 14:50:35 +03:00
OUT="$(Build.StagingDirectory)/perf-results.json"
2020-09-14 21:38:31 +03:00
START=$(date -u +%Y%m%d_%H%M%SZ)
2020-05-07 14:53:22 +03:00
if git diff --exit-code $TEST_SHA -- daml-lf/scenario-interpreter/src/perf >&2; then
2020-05-06 14:50:35 +03:00
# no changes, all good
2020-09-14 21:38:31 +03:00
ci/cron/perf/compare.sh $BASELINE > "$OUT"
cat "$OUT"
2020-05-06 14:50:35 +03:00
else
# the tests have changed, we need to figure out what to do with
# the baseline.
2020-09-14 21:38:31 +03:00
echo "Baseline no longer valid, needs manual correction." > "$OUT"
2020-05-06 14:50:35 +03:00
fi
2020-09-14 21:38:31 +03:00
2020-10-08 19:37:14 +03:00
gcs "$GCRED" cp "$OUT" gs://daml-data/perf/speedy/$START.json
2020-09-14 21:38:31 +03:00
2020-05-06 14:50:35 +03:00
displayName : measure perf
2020-09-14 21:38:31 +03:00
env :
GCRED : $(GOOGLE_APPLICATION_CREDENTIALS_CONTENT)
2020-06-03 17:36:05 +03:00
- template : ../daily_tell_slack.yml
parameters :
2020-06-09 09:45:58 +03:00
success-message : '$(cat $(Build.StagingDirectory)/perf-results.json | jq . | jq -sR ' '"perf for ' '"$COMMIT_LINK"' ':```\(.)```"' ')'
2020-09-16 20:02:02 +03:00
- job : perf_http_json
timeoutInMinutes : 120
pool :
name : "linux-pool"
demands : assignment -equals default
steps :
- checkout : self
- bash : ci/dev-env-install.sh
displayName : 'Build/Install the Developer Environment'
- bash : ci/configure-bazel.sh
displayName : 'Configure Bazel for root workspace'
env :
IS_FORK : $(System.PullRequest.IsFork)
# to upload to the bazel cache
GOOGLE_APPLICATION_CREDENTIALS_CONTENT : $(GOOGLE_APPLICATION_CREDENTIALS_CONTENT)
- template : ../bash-lib.yml
parameters :
var_name : bash_lib
- bash : |
set -euo pipefail
eval "$(dev-env/bin/dade assist)"
source $(bash_lib)
SCENARIOS="\
com.daml.http.perf.scenario.CreateCommand \
com.daml.http.perf.scenario.ExerciseCommand \
com.daml.http.perf.scenario.CreateAndExerciseCommand \
com.daml.http.perf.scenario.AsyncQueryConstantAcs \
com.daml.http.perf.scenario.SyncQueryConstantAcs \
com.daml.http.perf.scenario.SyncQueryNewAcs \
com.daml.http.perf.scenario.SyncQueryVariableAcs \
"
bazel build //docs:quickstart-model
DAR="${PWD}/bazel-bin/docs/quickstart-model.dar"
JWT="eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJodHRwczovL2RhbWwuY29tL2xlZGdlci1hcGkiOnsibGVkZ2VySWQiOiJNeUxlZGdlciIsImFwcGxpY2F0aW9uSWQiOiJmb29iYXIiLCJhY3RBcyI6WyJBbGljZSJdfX0.VdDI96mw5hrfM5ZNxLyetSVwcD7XtLT4dIdHIOa9lcU"
START=$(git log -n1 --format=%cd --date=format:%Y%m%d).$(git rev-list --count HEAD).$(Build.BuildId).$(git log -n1 --format=%h --abbrev=8)
REPORT_ID="http_json_perf_results_${START}"
OUT="$(Build.StagingDirectory)/${REPORT_ID}"
for scenario in $SCENARIOS; do
bazel run //ledger-service/http-json-perf:http-json-perf-binary -- \
--scenario=${scenario} \
--dars=${DAR} \
--reports-dir=${OUT} \
--jwt=${JWT}
done
2020-12-01 19:44:43 +03:00
GZIP=-9 tar -zcvf ${OUT}.tgz ${OUT}
2020-09-16 20:02:02 +03:00
2020-10-08 19:37:14 +03:00
gcs "$GCRED" cp "$OUT.tgz" "gs://daml-data/perf/http-json/${REPORT_ID}.tgz"
2020-09-16 20:02:02 +03:00
displayName : measure http-json performance
env :
GCRED : $(GOOGLE_APPLICATION_CREDENTIALS_CONTENT)
2020-10-01 22:01:42 +03:00
- job : check_releases
2020-11-25 21:12:03 +03:00
timeoutInMinutes : 240
2020-10-01 22:01:42 +03:00
pool :
name : linux-pool
demands : assignment -equals default
steps :
- checkout : self
- bash : ci/dev-env-install.sh
displayName : 'Build/Install the Developer Environment'
- template : ../bash-lib.yml
parameters :
var_name : bash_lib
- bash : |
set -euo pipefail
eval "$(dev-env/bin/dade assist)"
2020-10-06 19:45:29 +03:00
bazel build //ci/cron:cron
2020-10-09 15:55:37 +03:00
bazel-bin/ci/cron/cron check --bash-lib $(bash_lib) --gcp-creds "$GCRED"
2020-10-01 22:01:42 +03:00
displayName : check releases
env :
GCRED : $(GOOGLE_APPLICATION_CREDENTIALS_CONTENT)
2020-10-05 18:31:11 +03:00
- template : ../daily_tell_slack.yml
2020-12-07 22:59:39 +03:00
- job : blackduck_scan
timeoutInMinutes : 1200
pool :
name : linux-pool
demands : assignment -equals default
steps :
- checkout : self
2020-12-10 12:08:28 +03:00
persistCredentials : true
2020-12-07 22:59:39 +03:00
- bash : ci/dev-env-install.sh
displayName : 'Build/Install the Developer Environment'
- bash : ci/configure-bazel.sh
displayName : 'Configure Bazel'
env :
IS_FORK : $(System.PullRequest.IsFork)
# to upload to the bazel cache
2020-12-10 12:08:28 +03:00
GOOGLE_APPLICATION_CREDENTIALS_CONTENT : $(GOOGLE_APPLICATION_CREDENTIALS_CONTENT)
2020-12-07 22:59:39 +03:00
- bash : |
set -euo pipefail
eval "$(dev-env/bin/dade assist)"
export LC_ALL=en_US.UTF-8
2020-12-10 12:08:28 +03:00
2020-12-07 22:59:39 +03:00
bazel build //...
# Make sure that Bazel query works
2020-12-10 12:08:28 +03:00
bazel query 'deps(//...)' >/dev/null
2020-12-07 22:59:39 +03:00
displayName : 'Build'
- bash : |
set -euo pipefail
eval "$(./dev-env/bin/dade-assist)"
2020-12-11 15:56:05 +03:00
#needs to be specified since blackduck can not scan all bazel dependency types in one go, haskell has to be scanned separatey and code location name uniquely identified to avoid stomping
BAZEL_DEPENDENCY_TYPE="haskell_cabal_library"
2020-12-07 22:59:39 +03:00
bash <(curl -s https://raw.githubusercontent.com/DACH-NY/security-blackduck/master/synopsys-detect) \
2020-12-11 15:56:05 +03:00
ci-build digital-asset_daml $(Build.SourceBranchName) \
2020-12-07 22:59:39 +03:00
--logging.level.com.synopsys.integration=DEBUG \
--detect.tools=BAZEL \
--detect.bazel.target=//... \
2020-12-11 15:56:05 +03:00
--detect.bazel.dependency.type=${BAZEL_DEPENDENCY_TYPE} \
--detect.policy.check.fail.on.severities=MAJOR,CRITICAL,BLOCKER \
2020-12-07 22:59:39 +03:00
--detect.notices.report=true \
2020-12-11 15:56:05 +03:00
--detect.code.location.name=digital-asset_daml_${BAZEL_DEPENDENCY_TYPE} \
2020-12-07 22:59:39 +03:00
--detect.report.timeout=1500
displayName : 'Blackduck Haskell Scan'
env :
BLACKDUCK_HUBDETECT_TOKEN : $(BLACKDUCK_HUBDETECT_TOKEN)
- bash : |
set -euo pipefail
eval "$(./dev-env/bin/dade-assist)"
2020-12-11 15:56:05 +03:00
#avoid stomping any previous bazel haskell scans for this repository by qualifying as a maven_install (aka jvm) bazel blackduck scan
BAZEL_DEPENDENCY_TYPE="maven_install"
2020-12-07 22:59:39 +03:00
bash <(curl -s https://raw.githubusercontent.com/DACH-NY/security-blackduck/master/synopsys-detect) \
2020-12-11 15:56:05 +03:00
ci-build digital-asset_daml $(Build.SourceBranchName) \
2020-12-07 22:59:39 +03:00
--logging.level.com.synopsys.integration=DEBUG \
--detect.npm.include.dev.dependencies=false \
--detect.excluded.detector.types=NUGET \
--detect.excluded.detector.types=GO_MOD \
--detect.yarn.prod.only=true \
--detect.python.python3=true \
--detect.tools=DETECTOR,BAZEL,DOCKER \
--detect.bazel.target=//... \
2020-12-11 15:56:05 +03:00
--detect.bazel.dependency.type=${BAZEL_DEPENDENCY_TYPE} \
--detect.detector.search.exclusion.paths=.bazel-cache,language-support/ts/codegen/tests/ts,language-support/ts,language-support/scala/examples/iou-no-codegen,language-support/scala/examples/quickstart-scala,docs/source/app-dev/bindings-java/code-snippets,docs/source/app-dev/bindings-java/quickstart/template-root,language-support/scala/examples/quickstart-scala,language-support/scala/examples/iou-no-codegen \
2020-12-07 22:59:39 +03:00
--detect.cleanup=false \
--detect.policy.check.fail.on.severities=MAJOR,CRITICAL,BLOCKER \
--detect.notices.report=true \
--detect.cleanup.bdio.files=true \
2020-12-11 15:56:05 +03:00
--detect.code.location.name=digital-asset_daml_${BAZEL_DEPENDENCY_TYPE} \
2020-12-07 22:59:39 +03:00
--detect.report.timeout=4500
displayName : 'Blackduck Scan'
env :
BLACKDUCK_HUBDETECT_TOKEN : $(BLACKDUCK_HUBDETECT_TOKEN)
2020-12-10 12:08:28 +03:00
- template : ../bash-lib.yml
parameters :
var_name : bash_lib
- bash : |
set -euo pipefail
eval "$(./dev-env/bin/dade-assist)"
source $(bash_lib)
tr -d '\015' <*_Black_Duck_Notices_Report.txt | grep -v digital-asset_daml >NOTICES
if git diff --exit-code -- NOTICES; then
echo "NOTICES file already up-to-date."
else
git add NOTICES
open_pr "notices-update-$(Build.BuildId)" "update NOTICES file"
fi
displayName : notices
2020-12-07 22:59:39 +03:00
- template : ../daily_tell_slack.yml