As @edolstra pointed out that the kernel module might be painful to
maintain. I strongly disagree because it's only a small module and it's
good to have such a canary in the tests no matter how the bootup process
looks like, so I'm going the masochistic route and try to maintain it.
If it *really* becomes too much maintenance burden, we can still drop or
disable kcanary.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
We don't want to push out a channel update whenever this test fails,
because that might have unexpected and confused side effects and it
*really* means that stage 1 of our boot up is broken.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
We already have a small regression test for #15226 within the swraid
installer test. Unfortunately, we only check there whether the md
kthread got signalled but not whether other rampaging processes are
still alive that *should* have been killed.
So in order to do this we provide multiple canary processes which are
checked after the system has booted up:
* canary1: It's a simple forking daemon which just sleeps until it's
going to be killed. Of course we expect this process to not
be alive anymore after boot up.
* canary2: Similar to canary1, but tries to mimick a kthread to make
sure that it's going to be properly killed at the end of
stage 1.
* canary3: Like canary2, but this time using a @ in front of its
command name to actually prevent it from being killed.
* kcanary: This one is a real kthread and it runs until killed, which
shouldn't be the case.
Tested with and without 67223ee and everything works as expected, at
least on my machine.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
This is a regression test for #15226, so that the test will fail once we
accidentally kill one or more of the md kthreads (aka: if safe mode is
enabled).
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
Unfortunately, pkill doesn't distinguish between kernel and user space
processes, so we need to make sure we don't accidentally kill kernel
threads.
Normally, a kernel thread ignores all signals, but there are a few that
do. A quick grep on the kernel source tree (as of kernel 4.6.0) shows
the following source files which use allow_signal():
drivers/isdn/mISDN/l1oip_core.c
drivers/md/md.c
drivers/misc/mic/cosm/cosm_scif_server.c
drivers/misc/mic/cosm_client/cosm_scif_client.c
drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
drivers/staging/rtl8188eu/core/rtw_cmd.c
drivers/staging/rtl8712/rtl8712_cmd.c
drivers/target/iscsi/iscsi_target.c
drivers/target/iscsi/iscsi_target_login.c
drivers/target/iscsi/iscsi_target_nego.c
drivers/usb/atm/usbatm.c
drivers/usb/gadget/function/f_mass_storage.c
fs/jffs2/background.c
fs/lockd/clntlock.c
fs/lockd/svc.c
fs/nfs/nfs4state.c
fs/nfsd/nfssvc.c
While not all of these are necessarily kthreads and some functionality
may still be unimpeded, it's still quite harmful and can cause
unexpected side-effects, especially because some of these kthreads are
storage-related (which we obviously don't want to kill during bootup).
During discussion at #15226, @dezgeg suggested the following
implementation:
for pid in $(pgrep -v -f '@'); do
if [ "$(cat /proc/$pid/cmdline)" != "" ]; then
kill -9 "$pid"
fi
done
This has a few downsides:
* User space processes which use an empty string in their command line
won't be killed.
* It results in errors during bootup because some shell-related
processes are already terminated (maybe it's pgrep itself, haven't
checked).
* The @ is searched within the full command line, not just at the
beginning of the string. Of course, we already had this until now, so
it's not a problem of his implementation.
I posted an alternative implementation which doesn't suffer from the
first point, but even that one wasn't sufficient:
for pid in $(pgrep -v -f '^@'); do
readlink "/proc/$pid/exe" &> /dev/null || continue
echo "$pid"
done | xargs kill -9
This one spawns a subshell, which would be included in the processes to
kill and actually kills itself during the process.
So what we have now is even checking whether the shell process itself is
in the list to kill and avoids killing it just to be sure.
Also, we don't spawn a subshell anymore and use /proc/$pid/exe to
distinguish between user space and kernel processes like in the comments
of the following StackOverflow answer:
http://stackoverflow.com/a/12231039
We don't need to take care of terminating processes, because what we
actually want IS to terminate the processes.
The only point where this (and any previous) approach falls short if we
have processes that act like fork bombs, because they might spawn
additional processes between the pgrep and the killing. We can only
address this with process/control groups and this still won't save us
because the root user can escape from that as well.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
Fixes: #15226
* Perform HTTP HEAD request instead of full GET (lighter weight)
* Don't log output of curl to the journal (it's noise/debug)
* Use explicit http:// URL scheme
* Reduce poll interval from 10s to 2s (respond to state changes
quicker). Probably not relevant on boot (lots of services compete for
the CPU), but online service restarts/reloads should be quicker.
* Pass --fail to curl (should be more robust against false positives)
* Use 4 space indent for shell code.
(cherry picked from commit 78b6e8c319)
The current postStart code holds Jenkins off the "started" state until
Jenkins becomes idle. But it should be enough to wait until Jenkins
start handling HTTP requests to consider it "started".
More reasons why the current approach is bad and we should remove it,
from @coreyoconnor in
https://github.com/NixOS/nixpkgs/issues/14991#issuecomment-216572571:
1. Repeatedly curling for a specific human-readable string to
determine "Active" is fragile. For instance, what happens when jenkins
is localized?
2. The time jenkins takes to initializes is variable. This (at least
used to) depend on the number of jobs and any plugin upgrades requested.
3. Jenkins can be requested to restart from the UI. Which will not
affect the status of the service. This means that the service being
"active" does not imply jenkins is initialized. Downstream services
cannot assume jenkins is initialized if the service is active. Might
as well accept that and remove the initialized test from service
startup.
Fixes#14991.
(cherry picked from commit 51e5beca42)
A continuation of commit 23489b34c0
("Bring back $SSL_CERT_FILE"). Quoting that commit message:
Commit 9f358f809d removed
$SSL_CERT_FILE, which is fine for binaries linking against the current
OpenSSL package, but not old binaries (e.g. those installed via
nix-env). So let's keep $SSL_CERT_FILE for a while longer.
The above patch is only applied to 'release-16.03', so do the same for
this one.
The pre-sleep service exits if any command fails. Unloading facetimehd
without it being loaded blocks subsequent commands from running.
Note: `modprobe -r` works a bit better when unloading unused modules,
and is preferrable to `rmmod`. However, the facetimehd module does not
support suspending. In this case, it seems preferable to forcefully
unload the module. `modprobe` does not support a `--force` flag when
removing, so we are left with `rmmod`.
See:
- https://github.com/NixOS/nixpkgs/pull/14883
- https://github.com/patjak/bcwc_pcie/wiki#known-issues
The pre-sleep service exits if any command fails. Unloading facetimehd
without it being loaded blocks subsequent commands from running.
Note: `modprobe -r` works a bit better when unloading unused modules,
and is preferrable to `rmmod`. However, the facetimehd module does not
support suspending. In this case, it seems preferable to forcefully
unload the module. `modprobe` does not support a `--force` flag when
removing, so we are left with `rmmod`.
See:
- https://github.com/NixOS/nixpkgs/pull/14883
- https://github.com/patjak/bcwc_pcie/wiki#known-issues
It was failing with a `Read-only filesystem` failure due to the systemd
service option `ReadWriteDirectories` not being correctly configured.
Fixes#14132
(cherry picked from commit f5951c55f7)
Continuation of 79c3c16dcb. Systemd 229
sets the default RLIMIT_CORE to infinity, causing systems to be
littered with core dumps when systemd.coredump.enable is disabled.
This restores the 15.09 soft limit of 0 and hard limit of infinity.
(cherry picked from commit 840f3230a2)
At some point we probably want to replace this with a curated list
of configurations or even an upstreamed repository of examples, but
for now this is just noise.
FixesNixOS/nixpkgs#14522
(cherry picked from commit 678e1955b1)
NixOps has infrequent releases, so it's not the best place for keeping
the list of current AMIs. Putting them in Nixpkgs means that AMI
updates will be delivered as part of the NixOS channels.
(cherry picked from commit 4e356cefd7)
This reverts commit 45c218f893.
Busybox's modprobe causes numerous "Unknown symbol" errors in the
kernel log, even though the modules do appear to load correctly.
Unetbootin works by altering the image and placing a boot loader on it.
For this reason, it cannot work with UEFI and the installation guides
for other distributions (incl. Debian and Fedora) recommend against
using it.
Since dd writes the image verbatim to the drive, and not just the files,
it is not necessary to change the label after using it for UEFI
installations.
vcunat: tiny changes to the PR. Close#14139.
(cherry picked from commit d6998b0674)
This commit implements the changes necessary to start up a graphite carbon Cache
with twisted and start the corresponding graphiteWeb service.
Dependencies need to be included via python buildEnv to include all recursive
implicit dependencies.
Additionally cairo is a requirement of graphiteWeb and pycairo is not a standard
python package (buildPythonPackage) and therefore cannot be included via
buildEnv. It also needs cairo in the Library PATH.
(cherry picked from commit 626bfce3b8)
Signed-off-by: Domen Kožar <domen@dev.si>