After first panic, ignore others and tear down process even if in thread (#2725)

Spent a bit in a deep dive into how to handle this and honestly the
situation is rather unfortunate. The core problem is that when we have a
panic anywhere we need to tear down the app, and we'd like to do that as
cleanly as possible, avoiding throwing any other panics along the way if
possible.

We've been seeing a number of panics being reported which are
nonsensical, seemingly pointing to being a fallout panic from a worker
thread panic-ing, at which point we would write multiple panics to the
panic file, and we could possibly upload either both or the wrong panic
causing a wild goose chase. Unfortunately I've been entirely unable to
reproduce the specific panic we've been seeing but I was able to read
through the code responsible and confirm that under specific situations
a panic on one worker can cause another worker or the main thread to
also panic.

An easy solution to this is just to ignore any panics after the first
one. I'm thinking that *hopefully* we can trust the first panic to reach
the panic hook first so that the flag doesn't accidentally filter out
the panic we actually care about.

That being said we were expecting that to have already been the case
about which panic gets written to the panic file first, the first one in
the file being the one we upload, which doesn't seem to have been the
case. I'm hoping it was IO silliness causing that and that the flag
shouldn't be race-y, however this is still a shot in the dark. 🤞

As for cleanly shutting down, there's not really much we can do. One
thread physically cannot cause another to unwind without somehow sending
a message which isn't super useful. The only way for a thread to shut
down all threads and the process is to go nuclear and abort/exit the
process. This will never unwind other threads, effectively having the
same effect on those threads as compiling with `panic = "abort"` would.

With some (mis)use of `std::panic::resume_unwind` we can at least say
that for whatever thread actually panic-ed we will unwind, and any other
threads that panic as a result will probably get at least partway
through unwinding. This is weird, almost a combination of panic
rewinding and aborting, and may actually be worse than just biting the
bullet and aborting immediately.

I'm really not a fan of where I've ended up but it does seem to at the
very least an improvement. The main question in my mind at this point is
whether it would be better to attempt to unwind what we can or go all in
on abort. I'd love some input on that.

Release Notes:
- Improved panic reporting when a background thread panics.
This commit is contained in:
Julia 2023-07-17 13:52:33 -04:00 committed by GitHub
commit 3e136943c0
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -36,7 +36,7 @@ use std::{
path::{Path, PathBuf},
str,
sync::{
atomic::{AtomicBool, Ordering},
atomic::{AtomicBool, AtomicU32, Ordering},
Arc, Weak,
},
thread,
@ -405,11 +405,18 @@ struct PanicRequest {
token: String,
}
static PANIC_COUNT: AtomicU32 = AtomicU32::new(0);
fn init_panic_hook(app: &App, installation_id: Option<String>) {
let is_pty = stdout_is_a_pty();
let platform = app.platform();
panic::set_hook(Box::new(move |info| {
let prior_panic_count = PANIC_COUNT.fetch_add(1, Ordering::SeqCst);
if prior_panic_count > 0 {
std::panic::resume_unwind(Box::new(()));
}
let app_version = ZED_APP_VERSION
.or_else(|| platform.app_version().ok())
.map_or("dev".to_string(), |v| v.to_string());
@ -464,7 +471,6 @@ fn init_panic_hook(app: &App, installation_id: Option<String>) {
if is_pty {
if let Some(panic_data_json) = serde_json::to_string_pretty(&panic_data).log_err() {
eprintln!("{}", panic_data_json);
return;
}
} else {
if let Some(panic_data_json) = serde_json::to_string(&panic_data).log_err() {
@ -481,6 +487,8 @@ fn init_panic_hook(app: &App, installation_id: Option<String>) {
}
}
}
std::process::abort();
}));
}