After first panic, ignore others and tear down process even if in thread (#2725)

Spent a bit in a deep dive into how to handle this and honestly the situation is rather unfortunate. The core problem is that when we have a panic anywhere we need to tear down the app, and we'd like to do that as cleanly as possible, avoiding throwing any other panics along the way if possible. We've been seeing a number of panics being reported which are nonsensical, seemingly pointing to being a fallout panic from a worker thread panic-ing, at which point we would write multiple panics to the panic file, and we could possibly upload either both or the wrong panic causing a wild goose chase. Unfortunately I've been entirely unable to reproduce the specific panic we've been seeing but I was able to read through the code responsible and confirm that under specific situations a panic on one worker can cause another worker or the main thread to also panic. An easy solution to this is just to ignore any panics after the first one. I'm thinking that *hopefully* we can trust the first panic to reach the panic hook first so that the flag doesn't accidentally filter out the panic we actually care about. That being said we were expecting that to have already been the case about which panic gets written to the panic file first, the first one in the file being the one we upload, which doesn't seem to have been the case. I'm hoping it was IO silliness causing that and that the flag shouldn't be race-y, however this is still a shot in the dark. 🤞 As for cleanly shutting down, there's not really much we can do. One thread physically cannot cause another to unwind without somehow sending a message which isn't super useful. The only way for a thread to shut down all threads and the process is to go nuclear and abort/exit the process. This will never unwind other threads, effectively having the same effect on those threads as compiling with `panic = "abort"` would. With some (mis)use of `std::panic::resume_unwind` we can at least say that for whatever thread actually panic-ed we will unwind, and any other threads that panic as a result will probably get at least partway through unwinding. This is weird, almost a combination of panic rewinding and aborting, and may actually be worse than just biting the bullet and aborting immediately. I'm really not a fan of where I've ended up but it does seem to at the very least an improvement. The main question in my mind at this point is whether it would be better to attempt to unwind what we can or go all in on abort. I'd love some input on that. Release Notes: - Improved panic reporting when a background thread panics.
2024-09-20 02:47:34 +03:00 · 2023-07-17 13:52:33 -04:00 · 2023-07-17 13:52:33 -04:00 · 3e136943c0
commit 3e136943c0
parent 10a1df3faa 6770aeeb3c
1 changed files with 10 additions and 2 deletions
--- a/crates/zed/src/main.rs
+++ b/crates/zed/src/main.rs
@ -36,7 +36,7 @@ use std::{
    path::{Path, PathBuf},
    str,
    sync::{
-        atomic::{AtomicBool, Ordering},
+        atomic::{AtomicBool, AtomicU32, Ordering},
        Arc, Weak,
    },
    thread,
@ -405,11 +405,18 @@ struct PanicRequest {
    token: String,
 }

+static PANIC_COUNT: AtomicU32 = AtomicU32::new(0);
+
 fn init_panic_hook(app: &App, installation_id: Option<String>) {
    let is_pty = stdout_is_a_pty();
    let platform = app.platform();

    panic::set_hook(Box::new(move |info| {
+        let prior_panic_count = PANIC_COUNT.fetch_add(1, Ordering::SeqCst);
+        if prior_panic_count > 0 {
+            std::panic::resume_unwind(Box::new(()));
+        }
+
        let app_version = ZED_APP_VERSION
            .or_else(|| platform.app_version().ok())
            .map_or("dev".to_string(), |v| v.to_string());
@ -464,7 +471,6 @@ fn init_panic_hook(app: &App, installation_id: Option<String>) {
        if is_pty {
            if let Some(panic_data_json) = serde_json::to_string_pretty(&panic_data).log_err() {
                eprintln!("{}", panic_data_json);
-                return;
            }
        } else {
            if let Some(panic_data_json) = serde_json::to_string(&panic_data).log_err() {
@ -481,6 +487,8 @@ fn init_panic_hook(app: &App, installation_id: Option<String>) {
                }
            }
        }
+
+        std::process::abort();
    }));
 }