digital-asset/daml - daml - gitea: Gitea Service

mirror of https://github.com/digital-asset/daml.git synced 2024-09-20 09:17:43 +03:00

Author	SHA1	Message	Date
Gary Verhaegen	55776f92ba	clear shared memory segment on macOS (#6530 ) For a while now we've had errors along the line of ``` FATAL: could not create shared memory segment: No space left on device DETAIL: Failed system call was shmget(key=5432001, size=56, 03600). HINT: This error does not mean that you have run out of disk space. It occurs either if all available shared memory IDs have been taken, in which case you need to raise the SHMMNI parameter in your kernel, or because the system's overall limit for shared memory has been reached. The PostgreSQL documentation contains more information about shared memory configuration. child process exited with exit code 1 ``` on macOS CI nodes, which we were not able to reproduce locally. Today I managed to, sort of by accident, and that allowed me to dig a bit further. The root cause seems to be that PostgreSQL, as run by Bazel, does not always seem to properly unlink the shared memory segment it uses to communicate with itself. On my machine, running: ``` bazel test -t- --runs_per_test=100 //ledger/sandbox:conformance-test-wall-clock-postgresql ``` and eyealling the results of ``` watch ipcs -mcopt ``` I would say about one in three runs leaks its memory segment. After much googling and some head scratching trying to figure out the C APIs for managing shared memory segments on macOS, I kind of stumbled on a reference to `pcirm` in a comment to some low-ranking StackOverflow answer. It looks like it's working very well on my machine, even if I run it while a test (and therefore an instance of pg) is running. I believe this is because the command does not actually remove the shared memory segments, but simply marks them for removal once the last process stops using it. (At least that's what the manpage describes.) CHANGELOG_BEGIN CHANGELOG_END	2020-06-30 01:40:16 +02:00
Gary Verhaegen	664df64e13	fix daily perf Slack notification (#6267 ) This PR fixes the Slack notification on daily perf runs. It also updates the perf sha. CHANGELOG_BEGIN CHANGELOG_END	2020-06-09 06:45:58 +00:00
Gary Verhaegen	445f6467d9	daily run: warn on master only (#6177 ) Currently the message to Slack is always triggered by running the daily checks. This means that it gets very noisy to: 1. Run the check on PRs affecting the check (like this one), 2. Rerun the check multiple times to ascertain that a given failure is flaky. With this PR, the message to Slack is replaced with a simple `echo` when these checks are not run from the `master` branch, so whoever (manually) triggered them can still get feedback on the result, but other people don't get spurious `@here` mentions. CHANGELOG_BEGIN CHANGELOG_END	2020-06-03 16:36:05 +02:00
Moritz Kiefer	629ec732dd	Include puppeteer tests in compat tests (#6018 ) * Include puppeteer tests in compat tests This PR adds the puppeteer based tests to the compatibility tests. This also means that they are now actually compatibility tests. Before, we only tested the SDK side. Apart from process management being a nightmare on Windows as usually, there are two things that might stick out here: 1. I’ve replaced the `sh_binary` wrapper by a `cc_binary`. There is a lengthy comment explaining why. I think at the moment, we could actually get rid of the wraper completely and add JAVA to path in the tests that need it but at least for now, I’d like to keep it until we are sure that we don’t need to add more to it (and then it’s also in the git history if we do need to resurrect it). 2. These tests are duplicated now similar to the `daml ledger ` tests. The reasoning here is different. They depend on the SDK tarball either way so performance wise there is no reason to keep them. However, we reference the other file in the docs which means we cannot change it freely. What we could do is to make this sufficiently flexible to handle both the `daml start` case and separate `daml sandbox`/`daml json-api` processes and then we can reference it in the docs. There is still added complexity for Windows but that’s necessary for users as well that want to run this on Windows so that seems unavoidable. (I should probably also remove my snarky comments 😇) I’d like to kee it duplicated for this PR and then we can clean it up afterwards. changelog_begin changelog_end Bump timeouts changelog_begin changelog_end	2020-05-22 14:02:59 +02:00
Moritz Kiefer	4916a28682	Include create-daml-app tests in compatibility tests (#5945 ) This is the first part of #5700 It adds tests that build create-daml-app using `daml build` and then run the codegen and build the UI. Contrary to our main tests these also run on Windows. This is actually reasonably simple by first building the typescript libraries on Linux and then downloading them on Windows. There are two parts that are still missing from the tests in the main workspace: 1. Building the extra feature. This should be fairly easy to add. 2. Running the pupeeter tests. At least MacOS and Linux should be reasonably easy. I don’t know what horrors Windows will throw at us. This step is what actually makes this a compatibility test. Currently it doesn’t actually launch Sandbox and the JSON API. Since this PR is already pretty large, I’d like to tackle those things separately. changelog_begin changelog_end	2020-05-13 10:39:51 +02:00
Gary Verhaegen	3899a59a11	switch back to hosted macOS nodes (#5935 ) CHANGELOG_BEGIN CHANGELOG_END	2020-05-11 22:59:33 +02:00
Gary Verhaegen	9b476416b8	switch back to Azure-provided macos nodes (#5920 ) This is temporary. It looks like the macOS nodes are dead; @nycnewman is looking into it, but in case he doesn't fix it in time, at least we have a backup plan so we're not completely blocked on Monday. CHANGELOG_BEGIN CHANGELOG_END	2020-05-11 09:28:40 +02:00
Gary Verhaegen	4a6ab84b69	add default machine capability (#5912 ) add default machine capability We semi-regularly need to do work that has the potential to disrupt a machine's local cache, rendering it broken for other streams of work. This can include upgrading nix, upgrading Bazel, debugging caching issues, or anything related to Windows. Right now we do not have any good solution for these situations. We can either not do those streams of work, or we can proceed with them and just accept that all other builds may get affected depending on which machine they get assigned to. Debugging broken nodes is particularly tricky as we do not have any way to force a build to run on a given node. This PR aims at providing a better alternative by (ab)using an Azure Pipelines feature called [capabilities](https://docs.microsoft.com/en-us/azure/devops/pipelines/agents/agents?view=azure-devops&tabs=browser#capabilities). The idea behind capabilities is that you assign a set of tags to a machine, and then a job can express its [demands](https://docs.microsoft.com/en-us/azure/devops/pipelines/process/demands?view=azure-devops&tabs=yaml), i.e. specify a set of tags machines need to have in order to run it. Support for this is fairly badly documented. We can gather from the documentation that a job can specify two things about a capability (through its `demands`): that a given tag exists, and that a given tag has an exact specified value. In particular, a job cannot specify that a capability should _not_ be present, meaning we cannot rely on, say, adding a "broken" tag to broken machines. Documentation on how to set capabilities for an agent is basically nonexistent, but [looking at the code](https://github.com/microsoft/azure-pipelines-agent/blob/master/src/Microsoft.VisualStudio.Services.Agent/Capabilities/UserCapabilitiesProvider.cs) indicates that they can be set by using a simple `key=value`-formatted text file, provided we can find the right place to put this file. This PR adds this file to our Linux, macOS and Windows node init scripts to define an `assignment` capability and adds a demand for a `default` value on each job. From then on, when we hit a case where we want a PR to run on a specific node, and to prevent other PRs from running on that node, we can manually override the capability from the Azure UI and update the demand in the relevant YAML file in the PR. CHANGELOG_BEGIN CHANGELOG_END	2020-05-09 18:21:42 +02:00
Gary Verhaegen	11a2fc3c2d	more flexible perf test check (#5891 ) This PR separates the "last known valid perf test" commit from the "baseline speedy implementation" commit. It is important for the perf test to be meaningful that the changes between those two commits are benign, say minor API adjustments, so that the perf measurement remains meaningful. This also adds a check on merging to master that tells Slack if the perf test has changed and the `test_sha` file needs updating. The Slack message is conditional on the current commit to avoid excessive noise. CHANGELOG_BEGIN CHANGELOG_END	2020-05-07 13:53:22 +02:00
Gary Verhaegen	204c8b0657	add daily perf report (#5843 ) This PR adds a simple daily job that runs the performance test on a chosen "baseline" commit and then runs the same benchmark on latest master. This should allow us to track overall performance improvements. CHANGELOG_BEGIN CHANGELOG_END	2020-05-06 13:50:35 +02:00
Moritz Kiefer	b291e96ce1	Publish execution logs from Windows compatibility jobs (#5834 ) Hopefully, this helps diagnose the Windows CI failures. changelog_begin changelog_end	2020-05-05 12:23:11 +02:00
Moritz Kiefer	49e19ebed1	Make compat tests work on windows (#5732 ) * Make compat tests work on windows This required some changes to the daml_sdk rule since the read-only installation by the assistant breaks Bazel completely. We could only apply those changes on Windows but I think I prefer the consistency across platforms here over trying to stay close to how the SDK is installed on user machines given that the SDK installation is not something we’ve had issues with. I’ve excluded the postgresql tests for now. I don’t expect them to be particularly hard to fix but I’ve already spent almost 2 days on this and having some tests run on Windows seems like a clear improvement over running no tests on Windows :) changelog_begin changelog_end * Remove todo changelog_begin changelog_end	2020-04-28 16:06:36 +02:00
Gary Verhaegen	cfae3df7fa	report compat status every day (#5744 ) I believe the compatibility check is important enough, and should fail rarely enough, that it is worth reporting even on success. This will mean (once we support Windows) 3 messages a day, sent while presumably nobody is working, so the disruption should be minimal. The issue with reporting only on failures is that, if we don't proactively check (which we do for the state of master for different reasons, but would likely not keep doing for a job that doesn't block PRs), we may get into a state where it is so broken that it doesn't even report. CHANGELOG_BEGIN CHANGELOG_END	2020-04-28 12:30:45 +02:00
Gary Verhaegen	7ceda5678a	run compatibility tests on macos (#5723 ) This PR extends the existing Linux compatibility tests to run on macOS too. Fixes #5692. CHANGELOG_BEGIN CHANGELOG_END Co-authored-by: Moritz Kiefer <moritz.kiefer@purelyfunctional.org>	2020-04-27 14:55:16 +02:00
Moritz Kiefer	0d1f21e4a2	Extend compatibility tests to test against HEAD (#5714 ) fixes #5691 changelog_begin changelog_end	2020-04-24 14:43:35 +02:00
Moritz Kiefer	f61aadc422	Add dade-assist step to compat cron job (#5712 ) changelog_begin changelog_end	2020-04-24 11:25:14 +02:00
Gary Verhaegen	ec1a9326ea	fix daily-compat.yml (#5686 ) For some reason the Azure Pipelines folks thought it was a good idea to search for templates starting with the current YAML file's path. Other steps (e.g. the bash script here) start from the root of the repo, though. CHANGELOG_BEGIN CHANGELOG_END	2020-04-23 13:38:57 +02:00
Moritz Kiefer	7d36402412	Initial boilerplate for cross-version compatibility testing (#5665 ) This is a first step towards testing cross-version compatibility. It doesn’t actuall do much yet but hopefully it should be easier to parallelize once we have the initial boilerplate in place so ideally I’d like to address most missing things and issues in separate PRs. changelog_begin changelog_end	2020-04-23 12:58:11 +02:00
Gary Verhaegen	644d4c7512	add empty daily CI run (#5675 ) This is meant to be filled once we have a better idea of what exactly we want to test. See #5665 for current thinking about it. CHANGELOG_BEGIN CHANGELOG_END	2020-04-23 10:36:20 +02:00

19 Commits