From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9B993C433F5 for ; Tue, 3 May 2022 19:13:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241778AbiECTQf (ORCPT ); Tue, 3 May 2022 15:16:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44630 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232835AbiECTQe (ORCPT ); Tue, 3 May 2022 15:16:34 -0400 Received: from fanzine2.igalia.com (fanzine.igalia.com [178.60.130.6]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 028553F8A9; Tue, 3 May 2022 12:13:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=igalia.com; s=20170329; h=Content-Transfer-Encoding:Content-Type:In-Reply-To:From: References:Cc:To:Subject:MIME-Version:Date:Message-ID:Sender:Reply-To: Content-ID:Content-Description:Resent-Date:Resent-From:Resent-Sender: Resent-To:Resent-Cc:Resent-Message-ID:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=YwqRVzAP9ZbrZ5VBMcTGFQUJhCIOHa56e0xcGxSMZUc=; b=g6wHrcBwrz2KwYvXjjVtORBiHn IYMsR5GaRD/VR0xedfaxvBvF7rTbYR/J22t3j5lb3f06u/Fc1RouLR18wpSiKcYUFYo1Es0dUnLOJ KS0gcPiTX1Gp43QoEXeB9qey3fro+JGWKqZb1HOFCCA1BT9LzygpBsI5rkGlNutynR2bG5HAOwC/E tPyLP92RA6GDIRm0QuhAm+c5aBu4j0UaKxwKd9I7j0kUsDE4qHndJxIdGDBq1RIB7dNA6suRfXZwh e21o4FfMtSVNbUCh0WQI73Gd4fdNm7+rXf7RsUeEiM3OYwwmkesCvbsvfiNKmEoMOIdut9FTHQRzP TtMjBb/w==; Received: from [179.113.53.197] (helo=[192.168.1.60]) by fanzine2.igalia.com with esmtpsa (Cipher TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_128_GCM:128) (Exim) id 1nlxwt-0003VB-Uo; Tue, 03 May 2022 21:12:40 +0200 Message-ID: Date: Tue, 3 May 2022 16:12:09 -0300 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.8.0 Subject: Re: [PATCH 04/30] firmware: google: Convert regular spinlock into trylock on panic path Content-Language: en-US To: Evan Green Cc: Andrew Morton , bhe@redhat.com, pmladek@suse.com, kexec@lists.infradead.org, LKML , bcm-kernel-feedback-list@broadcom.com, linuxppc-dev@lists.ozlabs.org, linux-alpha@vger.kernel.org, linux-edac@vger.kernel.org, linux-hyperv@vger.kernel.org, linux-leds@vger.kernel.org, linux-mips@vger.kernel.org, linux-parisc@vger.kernel.org, Linux PM , linux-remoteproc@vger.kernel.org, linux-s390@vger.kernel.org, linux-tegra@vger.kernel.org, linux-um@lists.infradead.org, linux-xtensa@linux-xtensa.org, netdev@vger.kernel.org, openipmi-developer@lists.sourceforge.net, rcu@vger.kernel.org, sparclinux@vger.kernel.org, xen-devel@lists.xenproject.org, x86@kernel.org, kernel-dev@igalia.com, kernel@gpiccoli.net, halves@canonical.com, fabiomirmar@gmail.com, alejandro.j.jimenez@oracle.com, Andy Shevchenko , Arnd Bergmann , Borislav Petkov , Jonathan Corbet , d.hatayama@jp.fujitsu.com, dave.hansen@linux.intel.com, dyoung@redhat.com, feng.tang@intel.com, Greg Kroah-Hartman , mikelley@microsoft.com, hidehiro.kawai.ez@hitachi.com, jgross@suse.com, john.ogness@linutronix.de, Kees Cook , luto@kernel.org, mhiramat@kernel.org, mingo@redhat.com, paulmck@kernel.org, peterz@infradead.org, rostedt@goodmis.org, senozhatsky@chromium.org, Alan Stern , Thomas Gleixner , vgoyal@redhat.com, vkuznets@redhat.com, Will Deacon , Ard Biesheuvel , David Gow , Julius Werner References: <20220427224924.592546-1-gpiccoli@igalia.com> <20220427224924.592546-5-gpiccoli@igalia.com> From: "Guilherme G. Piccoli" In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-remoteproc@vger.kernel.org On 03/05/2022 15:03, Evan Green wrote: > [...] > gsmi_shutdown_reason() is a common function called in other scenarios > as well, like reboot and thermal trip, where it may still make sense > to wait to acquire a spinlock. Maybe we should add a parameter to > gsmi_shutdown_reason() so that you can get your change on panic, but > we don't convert other callbacks into try-fail scenarios causing us to > miss logs. > Hi Evan, thanks for your feedback, much appreciated! What I've done in other cases like this was to have a helper checking the spinlock in the panic notifier - if we can acquire that, go ahead but if not, bail out. For a proper example of an implementation, check patch 13 of the series: https://lore.kernel.org/lkml/20220427224924.592546-14-gpiccoli@igalia.com/ . Do you agree with that, or prefer really a parameter in gsmi_shutdown_reason() ? I'll follow your choice =) > Though thinking more about it, is this really a Good Change (TM)? The > spinlock itself already disables interrupts, meaning the only case > where this change makes a difference is if the panic happens from > within the function that grabbed the spinlock (in which case the > callback is also likely to panic), or in an NMI that panics within > that window. The downside of this change is that if one core was > politely working through an event with the lock held, and another core > panics, we now might lose the panic log, even though it probably would > have gone through fine assuming the other core has a chance to > continue. My feeling is that this is a good change, indeed - a lot of places are getting changed like this, in this series. Reasoning: the problem with your example is that, by default, secondary CPUs are disabled in the panic path, through an IPI mechanism. IPIs take precedence and interrupt the work in these CPUs, effectively interrupting the "polite work" with the lock held heh Then, such CPU is put to sleep and we finally reach the panic notifier hereby discussed, in the main CPU. If the other CPU was shut-off *with the lock held*, it's never finishing such work, so the lock is never to be released. Conclusion: the spinlock can't be acquired, hence we broke the machine (which is already broken, given it's panic) in the path of this notifier. This should be really rare, but..possible. So I think we should protect against this scenario. We can grab others' feedback if you prefer, and of course you have the rights to refuse this change in the gsmi code, but from my point-of-view, I don't see any advantage in just assume the risk, specially since the change is very very simple. Cheers, Guilherme From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.ozlabs.org (lists.ozlabs.org [112.213.38.117]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 10E47C433EF for ; Tue, 3 May 2022 22:22:41 +0000 (UTC) Received: from boromir.ozlabs.org (localhost [IPv6:::1]) by lists.ozlabs.org (Postfix) with ESMTP id 4KtDwv3YdTz3cCx for ; Wed, 4 May 2022 08:22:39 +1000 (AEST) Authentication-Results: lists.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=igalia.com header.i=@igalia.com header.a=rsa-sha256 header.s=20170329 header.b=g6wHrcBw; dkim-atps=neutral Authentication-Results: lists.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=igalia.com (client-ip=178.60.130.6; helo=fanzine2.igalia.com; envelope-from=gpiccoli@igalia.com; receiver=) Authentication-Results: lists.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=igalia.com header.i=@igalia.com header.a=rsa-sha256 header.s=20170329 header.b=g6wHrcBw; dkim-atps=neutral Received: from fanzine2.igalia.com (fanzine.igalia.com [178.60.130.6]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 4Kt8k947Knz2x9M for ; Wed, 4 May 2022 05:13:05 +1000 (AEST) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=igalia.com; s=20170329; h=Content-Transfer-Encoding:Content-Type:In-Reply-To:From: References:Cc:To:Subject:MIME-Version:Date:Message-ID:Sender:Reply-To: Content-ID:Content-Description:Resent-Date:Resent-From:Resent-Sender: Resent-To:Resent-Cc:Resent-Message-ID:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=YwqRVzAP9ZbrZ5VBMcTGFQUJhCIOHa56e0xcGxSMZUc=; b=g6wHrcBwrz2KwYvXjjVtORBiHn IYMsR5GaRD/VR0xedfaxvBvF7rTbYR/J22t3j5lb3f06u/Fc1RouLR18wpSiKcYUFYo1Es0dUnLOJ KS0gcPiTX1Gp43QoEXeB9qey3fro+JGWKqZb1HOFCCA1BT9LzygpBsI5rkGlNutynR2bG5HAOwC/E tPyLP92RA6GDIRm0QuhAm+c5aBu4j0UaKxwKd9I7j0kUsDE4qHndJxIdGDBq1RIB7dNA6suRfXZwh e21o4FfMtSVNbUCh0WQI73Gd4fdNm7+rXf7RsUeEiM3OYwwmkesCvbsvfiNKmEoMOIdut9FTHQRzP TtMjBb/w==; Received: from [179.113.53.197] (helo=[192.168.1.60]) by fanzine2.igalia.com with esmtpsa (Cipher TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_128_GCM:128) (Exim) id 1nlxwt-0003VB-Uo; Tue, 03 May 2022 21:12:40 +0200 Message-ID: Date: Tue, 3 May 2022 16:12:09 -0300 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.8.0 Subject: Re: [PATCH 04/30] firmware: google: Convert regular spinlock into trylock on panic path Content-Language: en-US To: Evan Green References: <20220427224924.592546-1-gpiccoli@igalia.com> <20220427224924.592546-5-gpiccoli@igalia.com> From: "Guilherme G. Piccoli" In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Mailman-Approved-At: Wed, 04 May 2022 08:18:47 +1000 X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: linux-hyperv@vger.kernel.org, halves@canonical.com, David Gow , linux-xtensa@linux-xtensa.org, peterz@infradead.org, alejandro.j.jimenez@oracle.com, linux-remoteproc@vger.kernel.org, feng.tang@intel.com, linux-mips@vger.kernel.org, hidehiro.kawai.ez@hitachi.com, sparclinux@vger.kernel.org, Will Deacon , Thomas Gleixner , linux-leds@vger.kernel.org, linux-s390@vger.kernel.org, mikelley@microsoft.com, john.ogness@linutronix.de, bhe@redhat.com, Jonathan Corbet , paulmck@kernel.org, fabiomirmar@gmail.com, x86@kernel.org, Ard Biesheuvel , mingo@redhat.com, bcm-kernel-feedback-list@broadcom.com, xen-devel@lists.xenproject.org, dyoung@redhat.com, vgoyal@redhat.com, pmladek@suse.com, dave.hansen@linux.intel.com, Kees Cook , Arnd Bergmann , Linux PM , linux-um@lists.infradead.org, rostedt@goodmis.org, rcu@vger.kernel.org, Greg Kroah-Hartman , Borislav Petkov , luto@kernel.org, linux-tegra@vger.kernel.org, openipmi-developer@lists.sourceforge.net, Andy Shevchenko , vkuznets@redhat.com, linux-edac@vger.kernel.org, jgross@suse.com, linux-parisc@vger.kernel.org, netdev@vger.kernel.org, kernel@gpiccoli.net, kexec@lists.infradead.org, LKML , Alan Stern , senozhatsky@chromium.org, d.hatayama@jp.fujitsu.com, mhiramat@kernel.org, kernel-dev@igalia.com, linux-alpha@vger.kernel.org, Julius Werner , Andrew Morton , linuxppc-dev@lists.ozlabs.org Errors-To: linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Sender: "Linuxppc-dev" On 03/05/2022 15:03, Evan Green wrote: > [...] > gsmi_shutdown_reason() is a common function called in other scenarios > as well, like reboot and thermal trip, where it may still make sense > to wait to acquire a spinlock. Maybe we should add a parameter to > gsmi_shutdown_reason() so that you can get your change on panic, but > we don't convert other callbacks into try-fail scenarios causing us to > miss logs. > Hi Evan, thanks for your feedback, much appreciated! What I've done in other cases like this was to have a helper checking the spinlock in the panic notifier - if we can acquire that, go ahead but if not, bail out. For a proper example of an implementation, check patch 13 of the series: https://lore.kernel.org/lkml/20220427224924.592546-14-gpiccoli@igalia.com/ . Do you agree with that, or prefer really a parameter in gsmi_shutdown_reason() ? I'll follow your choice =) > Though thinking more about it, is this really a Good Change (TM)? The > spinlock itself already disables interrupts, meaning the only case > where this change makes a difference is if the panic happens from > within the function that grabbed the spinlock (in which case the > callback is also likely to panic), or in an NMI that panics within > that window. The downside of this change is that if one core was > politely working through an event with the lock held, and another core > panics, we now might lose the panic log, even though it probably would > have gone through fine assuming the other core has a chance to > continue. My feeling is that this is a good change, indeed - a lot of places are getting changed like this, in this series. Reasoning: the problem with your example is that, by default, secondary CPUs are disabled in the panic path, through an IPI mechanism. IPIs take precedence and interrupt the work in these CPUs, effectively interrupting the "polite work" with the lock held heh Then, such CPU is put to sleep and we finally reach the panic notifier hereby discussed, in the main CPU. If the other CPU was shut-off *with the lock held*, it's never finishing such work, so the lock is never to be released. Conclusion: the spinlock can't be acquired, hence we broke the machine (which is already broken, given it's panic) in the path of this notifier. This should be really rare, but..possible. So I think we should protect against this scenario. We can grab others' feedback if you prefer, and of course you have the rights to refuse this change in the gsmi code, but from my point-of-view, I don't see any advantage in just assume the risk, specially since the change is very very simple. Cheers, Guilherme From mboxrd@z Thu Jan 1 00:00:00 1970 From: Guilherme G. Piccoli Date: Tue, 3 May 2022 16:12:09 -0300 Subject: [PATCH 04/30] firmware: google: Convert regular spinlock into trylock on panic path In-Reply-To: References: <20220427224924.592546-1-gpiccoli@igalia.com> <20220427224924.592546-5-gpiccoli@igalia.com> Message-ID: List-Id: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: kexec@lists.infradead.org On 03/05/2022 15:03, Evan Green wrote: > [...] > gsmi_shutdown_reason() is a common function called in other scenarios > as well, like reboot and thermal trip, where it may still make sense > to wait to acquire a spinlock. Maybe we should add a parameter to > gsmi_shutdown_reason() so that you can get your change on panic, but > we don't convert other callbacks into try-fail scenarios causing us to > miss logs. > Hi Evan, thanks for your feedback, much appreciated! What I've done in other cases like this was to have a helper checking the spinlock in the panic notifier - if we can acquire that, go ahead but if not, bail out. For a proper example of an implementation, check patch 13 of the series: https://lore.kernel.org/lkml/20220427224924.592546-14-gpiccoli at igalia.com/ . Do you agree with that, or prefer really a parameter in gsmi_shutdown_reason() ? I'll follow your choice =) > Though thinking more about it, is this really a Good Change (TM)? The > spinlock itself already disables interrupts, meaning the only case > where this change makes a difference is if the panic happens from > within the function that grabbed the spinlock (in which case the > callback is also likely to panic), or in an NMI that panics within > that window. The downside of this change is that if one core was > politely working through an event with the lock held, and another core > panics, we now might lose the panic log, even though it probably would > have gone through fine assuming the other core has a chance to > continue. My feeling is that this is a good change, indeed - a lot of places are getting changed like this, in this series. Reasoning: the problem with your example is that, by default, secondary CPUs are disabled in the panic path, through an IPI mechanism. IPIs take precedence and interrupt the work in these CPUs, effectively interrupting the "polite work" with the lock held heh Then, such CPU is put to sleep and we finally reach the panic notifier hereby discussed, in the main CPU. If the other CPU was shut-off *with the lock held*, it's never finishing such work, so the lock is never to be released. Conclusion: the spinlock can't be acquired, hence we broke the machine (which is already broken, given it's panic) in the path of this notifier. This should be really rare, but..possible. So I think we should protect against this scenario. We can grab others' feedback if you prefer, and of course you have the rights to refuse this change in the gsmi code, but from my point-of-view, I don't see any advantage in just assume the risk, specially since the change is very very simple. Cheers, Guilherme From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Message-ID: Date: Tue, 3 May 2022 16:12:09 -0300 MIME-Version: 1.0 Subject: Re: [PATCH 04/30] firmware: google: Convert regular spinlock into trylock on panic path Content-Language: en-US References: <20220427224924.592546-1-gpiccoli@igalia.com> <20220427224924.592546-5-gpiccoli@igalia.com> From: "Guilherme G. Piccoli" In-Reply-To: List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-um" Errors-To: linux-um-bounces+geert=linux-m68k.org@lists.infradead.org To: Evan Green Cc: Andrew Morton , bhe@redhat.com, pmladek@suse.com, kexec@lists.infradead.org, LKML , bcm-kernel-feedback-list@broadcom.com, linuxppc-dev@lists.ozlabs.org, linux-alpha@vger.kernel.org, linux-edac@vger.kernel.org, linux-hyperv@vger.kernel.org, linux-leds@vger.kernel.org, linux-mips@vger.kernel.org, linux-parisc@vger.kernel.org, Linux PM , linux-remoteproc@vger.kernel.org, linux-s390@vger.kernel.org, linux-tegra@vger.kernel.org, linux-um@lists.infradead.org, linux-xtensa@linux-xtensa.org, netdev@vger.kernel.org, openipmi-developer@lists.sourceforge.net, rcu@vger.kernel.org, sparclinux@vger.kernel.org, xen-devel@lists.xenproject.org, x86@kernel.org, kernel-dev@igalia.com, kernel@gpiccoli.net, halves@canonical.com, fabiomirmar@gmail.com, alejandro.j.jimenez@oracle.com, Andy Shevchenko , Arnd Bergmann , Borislav Petkov , Jonathan Corbet , d.hatayama@jp.fujitsu.com, dave.hansen@linux.intel.com, dyoung@redhat.com, feng.tang@intel.com, Greg Kroah-Hartman , mikelley@microsoft.com, hidehiro.kawai.ez@hitachi.com, jgross@suse.com, john.ogness@linutronix.de, Kees Cook , luto@kernel.org, mhiramat@kernel.org, mingo@redhat.com, paulmck@kernel.org, peterz@infradead.org, rostedt@goodmis.org, senozhatsky@chromium.org, Alan Stern , Thomas Gleixner , vgoyal@redhat.com, vkuznets@redhat.com, Will Deacon , Ard Biesheuvel , David Gow , Julius Werner On 03/05/2022 15:03, Evan Green wrote: > [...] > gsmi_shutdown_reason() is a common function called in other scenarios > as well, like reboot and thermal trip, where it may still make sense > to wait to acquire a spinlock. Maybe we should add a parameter to > gsmi_shutdown_reason() so that you can get your change on panic, but > we don't convert other callbacks into try-fail scenarios causing us to > miss logs. > Hi Evan, thanks for your feedback, much appreciated! What I've done in other cases like this was to have a helper checking the spinlock in the panic notifier - if we can acquire that, go ahead but if not, bail out. For a proper example of an implementation, check patch 13 of the series: https://lore.kernel.org/lkml/20220427224924.592546-14-gpiccoli@igalia.com/ . Do you agree with that, or prefer really a parameter in gsmi_shutdown_reason() ? I'll follow your choice =) > Though thinking more about it, is this really a Good Change (TM)? The > spinlock itself already disables interrupts, meaning the only case > where this change makes a difference is if the panic happens from > within the function that grabbed the spinlock (in which case the > callback is also likely to panic), or in an NMI that panics within > that window. The downside of this change is that if one core was > politely working through an event with the lock held, and another core > panics, we now might lose the panic log, even though it probably would > have gone through fine assuming the other core has a chance to > continue. My feeling is that this is a good change, indeed - a lot of places are getting changed like this, in this series. Reasoning: the problem with your example is that, by default, secondary CPUs are disabled in the panic path, through an IPI mechanism. IPIs take precedence and interrupt the work in these CPUs, effectively interrupting the "polite work" with the lock held heh Then, such CPU is put to sleep and we finally reach the panic notifier hereby discussed, in the main CPU. If the other CPU was shut-off *with the lock held*, it's never finishing such work, so the lock is never to be released. Conclusion: the spinlock can't be acquired, hence we broke the machine (which is already broken, given it's panic) in the path of this notifier. This should be really rare, but..possible. So I think we should protect against this scenario. We can grab others' feedback if you prefer, and of course you have the rights to refuse this change in the gsmi code, but from my point-of-view, I don't see any advantage in just assume the risk, specially since the change is very very simple. Cheers, Guilherme _______________________________________________ linux-um mailing list linux-um@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-um From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Guilherme G. Piccoli" Subject: Re: [PATCH 04/30] firmware: google: Convert regular spinlock into trylock on panic path Date: Tue, 3 May 2022 16:12:09 -0300 Message-ID: References: <20220427224924.592546-1-gpiccoli@igalia.com> <20220427224924.592546-5-gpiccoli@igalia.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=igalia.com; s=20170329; h=Content-Transfer-Encoding:Content-Type:In-Reply-To:From: References:Cc:To:Subject:MIME-Version:Date:Message-ID:Sender:Reply-To: Content-ID:Content-Description:Resent-Date:Resent-From:Resent-Sender: Resent-To:Resent-Cc:Resent-Message-ID:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=YwqRVzAP9ZbrZ5VBMcTGFQUJhCIOHa56e0xcGxSMZUc=; b=g6wHrcBwrz2KwYvXjjVtORBiHn IYMsR5GaRD/VR0xedfaxvBvF7rTbYR/J22t3j5lb3f06u/Fc1RouLR18wpSiKcYUFYo1Es0dUnLOJ KS0gcPiTX1Gp43QoEXeB9qey3fro+JGWKqZb1HOFCCA1BT9LzygpBsI5rkGlNutynR2bG5HAOwC/E tPyLP92RA6GDIRm0QuhAm+c5aBu4j0UaKxwKd9I7j0kUsDE4qHndJxIdGDBq1RIB7dNA6suRfXZwh e21o4FfMtSVNbUCh0WQI73Gd4fdNm7+rXf7RsUeEiM3OYwwmkesCvbsvfiNKmEoMOIdut9FTHQRzP TtMjBb/ Content-Language: en-US In-Reply-To: List-ID: Content-Type: text/plain; charset="us-ascii" To: Evan Green Cc: Andrew Morton , bhe-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, pmladek-IBi9RG/b67k@public.gmane.org, kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org, LKML , bcm-kernel-feedback-list-dY08KVG/lbpWk0Htik3J/w@public.gmane.org, linuxppc-dev-uLR06cmDAlY/bJ5BZ2RsiQ@public.gmane.org, linux-alpha-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-edac-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-hyperv-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-leds-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-mips-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-parisc-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Linux PM , linux-remoteproc-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-s390-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-tegra-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-um-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org, linux-xtensa-PjhNF2WwrV/0Sa2dR60CXw@public.gmane.org, netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, openipmi-developer-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org, rcu-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, sparclinux-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, xen-devel-GuqFBffKawtpuQazS67q72D2FQJk+8+b@public.gmane.org, x86-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org, kernel-dev-wEGTBA9jqPzQT0dZR+AlfA@public.gmane.org, kernel-WeLdAqEWwDvk1uMJSBkQmQ@public.gmane.org, halv On 03/05/2022 15:03, Evan Green wrote: > [...] > gsmi_shutdown_reason() is a common function called in other scenarios > as well, like reboot and thermal trip, where it may still make sense > to wait to acquire a spinlock. Maybe we should add a parameter to > gsmi_shutdown_reason() so that you can get your change on panic, but > we don't convert other callbacks into try-fail scenarios causing us to > miss logs. > Hi Evan, thanks for your feedback, much appreciated! What I've done in other cases like this was to have a helper checking the spinlock in the panic notifier - if we can acquire that, go ahead but if not, bail out. For a proper example of an implementation, check patch 13 of the series: https://lore.kernel.org/lkml/20220427224924.592546-14-gpiccoli-wEGTBA9jqPzQT0dZR+AlfA@public.gmane.org/ . Do you agree with that, or prefer really a parameter in gsmi_shutdown_reason() ? I'll follow your choice =) > Though thinking more about it, is this really a Good Change (TM)? The > spinlock itself already disables interrupts, meaning the only case > where this change makes a difference is if the panic happens from > within the function that grabbed the spinlock (in which case the > callback is also likely to panic), or in an NMI that panics within > that window. The downside of this change is that if one core was > politely working through an event with the lock held, and another core > panics, we now might lose the panic log, even though it probably would > have gone through fine assuming the other core has a chance to > continue. My feeling is that this is a good change, indeed - a lot of places are getting changed like this, in this series. Reasoning: the problem with your example is that, by default, secondary CPUs are disabled in the panic path, through an IPI mechanism. IPIs take precedence and interrupt the work in these CPUs, effectively interrupting the "polite work" with the lock held heh Then, such CPU is put to sleep and we finally reach the panic notifier hereby discussed, in the main CPU. If the other CPU was shut-off *with the lock held*, it's never finishing such work, so the lock is never to be released. Conclusion: the spinlock can't be acquired, hence we broke the machine (which is already broken, given it's panic) in the path of this notifier. This should be really rare, but..possible. So I think we should protect against this scenario. We can grab others' feedback if you prefer, and of course you have the rights to refuse this change in the gsmi code, but from my point-of-view, I don't see any advantage in just assume the risk, specially since the change is very very simple. Cheers, Guilherme