From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.9 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1C5DDC433DF for ; Tue, 2 Jun 2020 21:09:17 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E0B91206C3 for ; Tue, 2 Jun 2020 21:09:16 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=chromium.org header.i=@chromium.org header.b="XM3q8pho" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728174AbgFBVJQ (ORCPT ); Tue, 2 Jun 2020 17:09:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36602 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726130AbgFBVJP (ORCPT ); Tue, 2 Jun 2020 17:09:15 -0400 Received: from mail-ej1-x644.google.com (mail-ej1-x644.google.com [IPv6:2a00:1450:4864:20::644]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 56932C08C5C0 for ; Tue, 2 Jun 2020 14:09:15 -0700 (PDT) Received: by mail-ej1-x644.google.com with SMTP id y13so14212667eju.2 for ; Tue, 02 Jun 2020 14:09:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=QGf11SbLkxX4fJOwT9kGUgWEGfA4VGBgNO6kWFPFsRk=; b=XM3q8pho/PiowJ76NeZ46OC8b/u0qzNDKNLp7cEGV3nGiBc2PGReStjvx1Vz8tUMND 4apoBJ3u14oElCS0buaQvzQDvSszTwAsZ6Yhd5EN54LStE+ICJI6tZsQZF7P75x4vh38 Hlaj5tBKyfVXGXv4gKJiwm0zV3L6emG1YczkM= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=QGf11SbLkxX4fJOwT9kGUgWEGfA4VGBgNO6kWFPFsRk=; b=dKq7yfx5dUzETVQcDbs8MAOTUVuy7CyFRfoysqwwRff3cQXRhDVPdtgyULyW78RJwO N97vS6x9peonXb0sM0uaCDdlgr1YWdNJm6Ez3Vqg0U7a6SGm57E2aCCPj81IO59FD3K8 2a9HP9AjvSnBUHSO4vi0G4tEh8p3eCSxqM+l7REfOw0/4V8dLrqgFG/ICmT7CUFPEcBs UJtyEz4w4H+guiZR22TdqW+xV2/f5jPZFrWNzEd4dBMLZ1vY0aLyFmALXzk0ktHxrHIb f73OEUR5QOGGndcZ5Zd2es61wycr9hdOPvfJtOiBzJr6XUgJbGwKkQgK9sdwa6zdWalM IxDA== X-Gm-Message-State: AOAM533ibcsxyuEyGt7aUqtkTVh1e4KqAOMVUcrBXJv20Fyzg0IVBIv2 cijAuwhNdFunpEqsvmHow3Kzcb8Njx5o+w== X-Google-Smtp-Source: ABdhPJytolXA7MBc5D+NN3ywHGTVa/YD8loeO8FIykozvz58ayGt2zKQyJScsfEHOVB5aGA94WIQOA== X-Received: by 2002:a17:907:4096:: with SMTP id nm6mr11186723ejb.4.1591132153737; Tue, 02 Jun 2020 14:09:13 -0700 (PDT) Received: from mail-ed1-f50.google.com (mail-ed1-f50.google.com. [209.85.208.50]) by smtp.gmail.com with ESMTPSA id dc8sm2108348edb.10.2020.06.02.14.09.13 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 02 Jun 2020 14:09:13 -0700 (PDT) Received: by mail-ed1-f50.google.com with SMTP id p18so11212863eds.7 for ; Tue, 02 Jun 2020 14:09:13 -0700 (PDT) X-Received: by 2002:a2e:7f02:: with SMTP id a2mr429268ljd.138.1591131684280; Tue, 02 Jun 2020 14:01:24 -0700 (PDT) MIME-Version: 1.0 References: <20200526145815.6415-1-mcgrof@kernel.org> <20200526145815.6415-6-mcgrof@kernel.org> In-Reply-To: <20200526145815.6415-6-mcgrof@kernel.org> From: Brian Norris Date: Tue, 2 Jun 2020 14:01:12 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH v3 5/8] ath10k: use new taint_firmware_crashed() To: Luis Chamberlain Cc: jeyu@kernel.org, "David S. Miller" , kuba@kernel.org, linux-wireless , aquini@redhat.com, linux-doc@vger.kernel.org, peterz@infradead.org, Daniel Vetter , linux@dominikbrodowski.net, Linux Kernel , Masahiro Yamada , glider@google.com, GR-everest-linux-l2@marvell.com, mchehab+samsung@kernel.org, will@kernel.org, michael.chan@broadcom.com, Rob Herring , paulmck@kernel.org, bhe@redhat.com, corbet@lwn.net, mchehab+huawei@kernel.org, ath10k , derosier@gmail.com, Takashi Iwai , mingo@redhat.com, Dmitry Vyukov , Sami Tolvanen , yzaikin@google.com, dyoung@redhat.com, pmladek@suse.com, elver@google.com, sburla@marvell.com, aelior@marvell.com, Kees Cook , Arnd Bergmann , sfr@canb.auug.org.au, gpiccoli@canonical.com, Steven Rostedt , fmanlunas@marvell.com, cai@lca.pw, tglx@linutronix.de, Andy Shevchenko , Johannes Berg , Kalle Valo , "" , rdunlap@infradead.org, schlad@suse.de, Doug Anderson , vkoul@kernel.org, mhiramat@kernel.org, Andrew Morton , dchickles@marvell.com, bauerman@linux.ibm.com Content-Type: text/plain; charset="UTF-8" Sender: linux-wireless-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-wireless@vger.kernel.org On Tue, May 26, 2020 at 7:58 AM Luis Chamberlain wrote: > > This makes use of the new taint_firmware_crashed() to help > annotate when firmware for device drivers crash. When firmware > crashes devices can sometimes become unresponsive, and recovery > sometimes requires a driver unload / reload and in the worst cases > a reboot. Just for the record, the underlying problem you seem to be complaining about does not appear to be a firmware crash at all. It does happen to result in a firmware crash report much later on (because when the PCIe endpoint is this hosed, sooner or later the driver thinks the firmware is dead), but it's not likely the root cause. More below. > Using a taint flag allows us to annotate when this happens clearly. > > I have run into this situation with this driver with the latest > firmware as of today, May 21, 2020 using v5.6.0, leaving me at > a state at which my only option is to reboot. Driver removal and > addition does not fix the situation. This is reported on kernel.org > bugzilla korg#207851 [0]. I took a look, and replied there: https://bugzilla.kernel.org/show_bug.cgi?id=207851#c2 Per the above, it seems more likely you have a PCI or power management problem, not an ath10k or ath10k-firmware problem. > But this isn't the first firmware crash reported, > others have been filed before and none of these bugs have yet been > addressed [1] [2] [3]. Including my own I see these firmware crash > reports: Yes, firmware does crash. Sometimes repeatedly. It also happens to be closed source, so it's nearly impossible for the average Linux dev to debug. But FWIW, those 3 all appear to be recoverable -- and then they crash again a few minutes later. So just as claimed on prior iterations of this patchset, ath10k is doing fine at recovery [*] -- it's "only" the firmware that's a problem. (And, if a WiFi firmware doesn't like something in the RF environment...it's totally understandable that the crash will happen more than once. Of course that sucks, but it's not unexpected.) Crucially, rebooting won't really do anything to help these people, AIUI. Maybe what you really want is to taint the kernel every time a non-free firmware is loaded ;) I'd also note that those 3 reports are 3 years old. There have been many ath10k-firmware updates since then, so it's not necessarily fair to dig those back up. Also, bugzilla.kernel.org is totally ignored by many linux-wireless@ folks. But I digress... All in all, I have no interest in this proposal, for many of the reasons already mentioned on previous iterations. It's way too coarse and won't be useful in understanding what's going on in a system, IMO, at least for ath10k. But it's also easy enough to ignore, so if it makes somebody happy to claim a taint, then so be it. Regards, Brian [*] Although, at least one of those doesn't appear to be as "clean" of a recovery attempt as typical. Maybe there are some lurking driver bugs in there too. > * korg#207851 [0] > * korg#197013 [1] > * korg#201237 [2] > * korg#195987 [3] > > [0] https://bugzilla.kernel.org/show_bug.cgi?id=207851 > [1] https://bugzilla.kernel.org/show_bug.cgi?id=197013 > [2] https://bugzilla.kernel.org/show_bug.cgi?id=201237 > [3] https://bugzilla.kernel.org/show_bug.cgi?id=195987 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-path: Received: from mail-ej1-x644.google.com ([2a00:1450:4864:20::644]) by bombadil.infradead.org with esmtps (Exim 4.92.3 #3 (Red Hat Linux)) id 1jgE84-00088I-Dm for ath10k@lists.infradead.org; Tue, 02 Jun 2020 21:07:26 +0000 Received: by mail-ej1-x644.google.com with SMTP id a25so7004527ejg.5 for ; Tue, 02 Jun 2020 14:07:21 -0700 (PDT) Received: from mail-wr1-f46.google.com (mail-wr1-f46.google.com. [209.85.221.46]) by smtp.gmail.com with ESMTPSA id w13sm637eju.124.2020.06.02.14.07.19 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 02 Jun 2020 14:07:19 -0700 (PDT) Received: by mail-wr1-f46.google.com with SMTP id l10so77206wrr.10 for ; Tue, 02 Jun 2020 14:07:19 -0700 (PDT) MIME-Version: 1.0 References: <20200526145815.6415-1-mcgrof@kernel.org> <20200526145815.6415-6-mcgrof@kernel.org> In-Reply-To: <20200526145815.6415-6-mcgrof@kernel.org> From: Brian Norris Date: Tue, 2 Jun 2020 14:01:12 -0700 Message-ID: Subject: Re: [PATCH v3 5/8] ath10k: use new taint_firmware_crashed() List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "ath10k" Errors-To: ath10k-bounces+kvalo=adurom.com@lists.infradead.org To: Luis Chamberlain Cc: aquini@redhat.com, linux-doc@vger.kernel.org, peterz@infradead.org, Daniel Vetter , linux@dominikbrodowski.net, Doug Anderson , Masahiro Yamada , glider@google.com, GR-everest-linux-l2@marvell.com, mchehab+samsung@kernel.org, will@kernel.org, tglx@linutronix.de, Rob Herring , Arnd Bergmann , bhe@redhat.com, corbet@lwn.net, mchehab+huawei@kernel.org, ath10k , derosier@gmail.com, Takashi Iwai , mingo@redhat.com, Kalle Valo , Sami Tolvanen , kuba@kernel.org, yzaikin@google.com, dyoung@redhat.com, mhiramat@kernel.org, pmladek@suse.com, elver@google.com, gpiccoli@canonical.com, aelior@marvell.com, Kees Cook , paulmck@kernel.org, sfr@canb.auug.org.au, sburla@marvell.com, Steven Rostedt , fmanlunas@marvell.com, cai@lca.pw, michael.chan@broadcom.com, Andy Shevchenko , Andrew Morton , Dmitry Vyukov , "" , rdunlap@infradead.org, linux-wireless , Linux Kernel , vkoul@kernel.org, schlad@suse.de, jeyu@kernel.org, Johannes Berg , dchickles@marvell.com, "David S. Miller" , bauerman@linux.ibm.com On Tue, May 26, 2020 at 7:58 AM Luis Chamberlain wrote: > > This makes use of the new taint_firmware_crashed() to help > annotate when firmware for device drivers crash. When firmware > crashes devices can sometimes become unresponsive, and recovery > sometimes requires a driver unload / reload and in the worst cases > a reboot. Just for the record, the underlying problem you seem to be complaining about does not appear to be a firmware crash at all. It does happen to result in a firmware crash report much later on (because when the PCIe endpoint is this hosed, sooner or later the driver thinks the firmware is dead), but it's not likely the root cause. More below. > Using a taint flag allows us to annotate when this happens clearly. > > I have run into this situation with this driver with the latest > firmware as of today, May 21, 2020 using v5.6.0, leaving me at > a state at which my only option is to reboot. Driver removal and > addition does not fix the situation. This is reported on kernel.org > bugzilla korg#207851 [0]. I took a look, and replied there: https://bugzilla.kernel.org/show_bug.cgi?id=207851#c2 Per the above, it seems more likely you have a PCI or power management problem, not an ath10k or ath10k-firmware problem. > But this isn't the first firmware crash reported, > others have been filed before and none of these bugs have yet been > addressed [1] [2] [3]. Including my own I see these firmware crash > reports: Yes, firmware does crash. Sometimes repeatedly. It also happens to be closed source, so it's nearly impossible for the average Linux dev to debug. But FWIW, those 3 all appear to be recoverable -- and then they crash again a few minutes later. So just as claimed on prior iterations of this patchset, ath10k is doing fine at recovery [*] -- it's "only" the firmware that's a problem. (And, if a WiFi firmware doesn't like something in the RF environment...it's totally understandable that the crash will happen more than once. Of course that sucks, but it's not unexpected.) Crucially, rebooting won't really do anything to help these people, AIUI. Maybe what you really want is to taint the kernel every time a non-free firmware is loaded ;) I'd also note that those 3 reports are 3 years old. There have been many ath10k-firmware updates since then, so it's not necessarily fair to dig those back up. Also, bugzilla.kernel.org is totally ignored by many linux-wireless@ folks. But I digress... All in all, I have no interest in this proposal, for many of the reasons already mentioned on previous iterations. It's way too coarse and won't be useful in understanding what's going on in a system, IMO, at least for ath10k. But it's also easy enough to ignore, so if it makes somebody happy to claim a taint, then so be it. Regards, Brian [*] Although, at least one of those doesn't appear to be as "clean" of a recovery attempt as typical. Maybe there are some lurking driver bugs in there too. > * korg#207851 [0] > * korg#197013 [1] > * korg#201237 [2] > * korg#195987 [3] > > [0] https://bugzilla.kernel.org/show_bug.cgi?id=207851 > [1] https://bugzilla.kernel.org/show_bug.cgi?id=197013 > [2] https://bugzilla.kernel.org/show_bug.cgi?id=201237 > [3] https://bugzilla.kernel.org/show_bug.cgi?id=195987 _______________________________________________ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k