From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.3 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, INVALID_DATE_TZ_ABSURD,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 111ADC19F34 for ; Tue, 3 Aug 2021 15:30:52 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id F414660F94 for ; Tue, 3 Aug 2021 15:30:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236815AbhHCPbB (ORCPT ); Tue, 3 Aug 2021 11:31:01 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45814 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236741AbhHCPay (ORCPT ); Tue, 3 Aug 2021 11:30:54 -0400 Received: from galois.linutronix.de (Galois.linutronix.de [IPv6:2a0a:51c0:0:12e:550::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 09550C0617B1 for ; Tue, 3 Aug 2021 08:30:36 -0700 (PDT) From: John Ogness DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1628004633; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=JYAidDTtZlwD0eAmDJGiwHLKb2zrViR5iopfW/47ycc=; b=OqjOI/FbmrK18S3ZvolMVu6MnmqMbOBo9jrdeXX3wxV7sqCVpdjTRar0H7i0ExNzY5bR/X PdP6sVQRAPdOYEMbxPgbQpXhiP/YVGhgrClPXAfjkOLUjk6nKsl2opF1S0+2EWZi8fdTVQ OyF18c9yESpVQ1ED39gaPl7oFtEgs51vYZX7ouRviNrfeq89w+mkATGMlPRG116iVVa1sP aAVNstXlI6g5+Od+hSIym7JtGUKKUBCybkbw6Tq9/SDXhAX0rTcnPPRvsN0GraIj3Cb00c qPHfZgWp/+9IGkAMNPiGT1MYOTT8EI5B06YAqc4VThaM1sSv5Bxo5RItB4a/cg== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1628004633; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=JYAidDTtZlwD0eAmDJGiwHLKb2zrViR5iopfW/47ycc=; b=+B9KGpXLz+WiAwY5HjTHSwrgLkU2AO4ktjbvjOzrUjWKgCCoYGx8mHsRRMGq0543zI3G5c yoN+x2AyyiqMdSCw== To: Daniel Thompson Cc: Petr Mladek , Sergey Senozhatsky , Steven Rostedt , Thomas Gleixner , linux-kernel@vger.kernel.org, Michael Ellerman , Benjamin Herrenschmidt , Paul Mackerras , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H. Peter Anvin" , Jason Wessel , Douglas Anderson , Srikar Dronamraju , "Gautham R. Shenoy" , Chengyang Fan , Christophe Leroy , Bhaskar Chowdhury , Nicholas Piggin , =?utf-8?Q?C=C3=A9dric?= Le Goater , "Gustavo A. R. Silva" , Peter Zijlstra , linuxppc-dev@lists.ozlabs.org, kgdb-bugreport@lists.sourceforge.net Subject: Re: [PATCH printk v1 03/10] kgdb: delay roundup if holding printk cpulock In-Reply-To: <20210803142558.cz7apumpgijs5y4y@maple.lan> References: <20210803131301.5588-1-john.ogness@linutronix.de> <20210803131301.5588-4-john.ogness@linutronix.de> <20210803142558.cz7apumpgijs5y4y@maple.lan> Date: Tue, 03 Aug 2021 17:36:32 +0206 Message-ID: <87tuk635rb.fsf@jogness.linutronix.de> MIME-Version: 1.0 Content-Type: text/plain Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2021-08-03, Daniel Thompson wrote: > On Tue, Aug 03, 2021 at 03:18:54PM +0206, John Ogness wrote: >> kgdb makes use of its own cpulock (@dbg_master_lock, @kgdb_active) >> during cpu roundup. This will conflict with the printk cpulock. > > When the full vision is realized what will be the purpose of the printk > cpulock? > > I'm asking largely because it's current role is actively unhelpful > w.r.t. kdb. It is possible that cautious use of in_dbg_master() might > be a better (and safer) solution. However it sounds like there is a > larger role planned for the printk cpulock... The printk cpulock is used as a synchronization mechanism for implementing atomic consoles, which need to be able to safely interrupt the console write() activity at any time and immediately continue with their own printing. The ultimate goal is to move all console printing into per-console dedicated kthreads, so the primary function of the printk cpulock is really to immediately _stop_ the CPU/kthread performing write() in order to allow write_atomic() (from any context on any CPU) to safely and reliably take over. Atomic consoles are actually quite similar to the kgdb_io ops. For example, comparing: serial8250_console_write_atomic() + serial8250_console_putchar_locked() with serial8250_put_poll_char() The difference is that serial8250_console_write_atomic() is line-based and synchronizing with serial8250_console_write() so that if the kernel crashes while outputing to the console, write() can be interrupted by write_atomic() and cleanly formatted crash data can be output. Also serial8250_put_poll_char() is calling into __pm_runtime_resume(), which includes a spinlock and possibly sleeping. This would not be acceptable for atomic consoles. Although, as Andy pointed out [0], I will need to figure out how to deal with suspended consoles. Or just implement a policy that registered atomic consoles may never be suspended. I had not considered merging kgdb_io ops with atomic console ops. But now that I look at it more closely, there may be some useful overlap. I will consider this. Thank you for this idea. >> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c >> index 3d0c933937b4..1b546e117f10 100644 >> --- a/kernel/printk/printk.c >> +++ b/kernel/printk/printk.c >> @@ -214,6 +215,7 @@ int devkmsg_sysctl_set_loglvl(struct ctl_table *table, int write, >> #ifdef CONFIG_SMP >> static atomic_t printk_cpulock_owner = ATOMIC_INIT(-1); >> static atomic_t printk_cpulock_nested = ATOMIC_INIT(0); >> +static unsigned int kgdb_cpu = -1; > > Is this the flag to provoke retriggering? It appears to be a write-only > variable (at least in this patch). How is it consumed? Critical catch! Thank you. I am quite unhappy to see these hunks were accidentally dropped when generating this series. @@ -3673,6 +3675,9 @@ EXPORT_SYMBOL(__printk_cpu_trylock); */ void __printk_cpu_unlock(void) { + bool trigger_kgdb = false; + unsigned int cpu; + if (atomic_read(&printk_cpulock_nested)) { atomic_dec(&printk_cpulock_nested); return; @@ -3683,6 +3688,12 @@ void __printk_cpu_unlock(void) * LMM(__printk_cpu_unlock:A) */ + cpu = smp_processor_id(); + if (kgdb_cpu == cpu) { + trigger_kgdb = true; + kgdb_cpu = -1; + } + /* * Guarantee loads and stores from this CPU when it was the * lock owner are visible to the next lock owner. This pairs @@ -3703,6 +3714,21 @@ void __printk_cpu_unlock(void) */ atomic_set_release(&printk_cpulock_owner, -1); /* LMM(__printk_cpu_unlock:B) */ + + if (trigger_kgdb) { + pr_warn("re-triggering kgdb roundup for CPU#%d\n", cpu); + kgdb_roundup_cpu(cpu); + } } EXPORT_SYMBOL(__printk_cpu_unlock); John Ogness [0] https://lore.kernel.org/lkml/YQlKAeXS9MPmE284@smile.fi.intel.com From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.1 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, INVALID_DATE_TZ_ABSURD,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 11382C4338F for ; Tue, 3 Aug 2021 15:31:09 +0000 (UTC) Received: from lists.ozlabs.org (lists.ozlabs.org [112.213.38.117]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 9A7CC60555 for ; Tue, 3 Aug 2021 15:31:08 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 9A7CC60555 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linutronix.de Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=lists.ozlabs.org Received: from boromir.ozlabs.org (localhost [IPv6:::1]) by lists.ozlabs.org (Postfix) with ESMTP id 4GfJk319ywz3cLw for ; Wed, 4 Aug 2021 01:31:07 +1000 (AEST) Authentication-Results: lists.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; secure) header.d=linutronix.de header.i=@linutronix.de header.a=rsa-sha256 header.s=2020 header.b=OqjOI/Fb; dkim=fail reason="signature verification failed" header.d=linutronix.de header.i=@linutronix.de header.a=ed25519-sha256 header.s=2020e header.b=+B9KGpXL; dkim-atps=neutral Authentication-Results: lists.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=linutronix.de (client-ip=2a0a:51c0:0:12e:550::1; helo=galois.linutronix.de; envelope-from=john.ogness@linutronix.de; receiver=) Authentication-Results: lists.ozlabs.org; dkim=pass (2048-bit key; secure) header.d=linutronix.de header.i=@linutronix.de header.a=rsa-sha256 header.s=2020 header.b=OqjOI/Fb; dkim=pass header.d=linutronix.de header.i=@linutronix.de header.a=ed25519-sha256 header.s=2020e header.b=+B9KGpXL; dkim-atps=neutral Received: from galois.linutronix.de (Galois.linutronix.de [IPv6:2a0a:51c0:0:12e:550::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 4GfJjW2zFgz2yNx for ; Wed, 4 Aug 2021 01:30:39 +1000 (AEST) From: John Ogness DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1628004633; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=JYAidDTtZlwD0eAmDJGiwHLKb2zrViR5iopfW/47ycc=; b=OqjOI/FbmrK18S3ZvolMVu6MnmqMbOBo9jrdeXX3wxV7sqCVpdjTRar0H7i0ExNzY5bR/X PdP6sVQRAPdOYEMbxPgbQpXhiP/YVGhgrClPXAfjkOLUjk6nKsl2opF1S0+2EWZi8fdTVQ OyF18c9yESpVQ1ED39gaPl7oFtEgs51vYZX7ouRviNrfeq89w+mkATGMlPRG116iVVa1sP aAVNstXlI6g5+Od+hSIym7JtGUKKUBCybkbw6Tq9/SDXhAX0rTcnPPRvsN0GraIj3Cb00c qPHfZgWp/+9IGkAMNPiGT1MYOTT8EI5B06YAqc4VThaM1sSv5Bxo5RItB4a/cg== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1628004633; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=JYAidDTtZlwD0eAmDJGiwHLKb2zrViR5iopfW/47ycc=; b=+B9KGpXLz+WiAwY5HjTHSwrgLkU2AO4ktjbvjOzrUjWKgCCoYGx8mHsRRMGq0543zI3G5c yoN+x2AyyiqMdSCw== To: Daniel Thompson Subject: Re: [PATCH printk v1 03/10] kgdb: delay roundup if holding printk cpulock In-Reply-To: <20210803142558.cz7apumpgijs5y4y@maple.lan> References: <20210803131301.5588-1-john.ogness@linutronix.de> <20210803131301.5588-4-john.ogness@linutronix.de> <20210803142558.cz7apumpgijs5y4y@maple.lan> Date: Tue, 03 Aug 2021 17:36:32 +0206 Message-ID: <87tuk635rb.fsf@jogness.linutronix.de> MIME-Version: 1.0 Content-Type: text/plain X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: "Gautham R. Shenoy" , Douglas Anderson , Srikar Dronamraju , Peter Zijlstra , linux-kernel@vger.kernel.org, Paul Mackerras , "H. Peter Anvin" , Chengyang Fan , Bhaskar Chowdhury , x86@kernel.org, Ingo Molnar , kgdb-bugreport@lists.sourceforge.net, Petr Mladek , Nicholas Piggin , Borislav Petkov , Steven Rostedt , Thomas Gleixner , "Gustavo A. R. Silva" , Sergey Senozhatsky , Jason Wessel , linuxppc-dev@lists.ozlabs.org, =?utf-8?Q?C=C3=A9dric?= Le Goater Errors-To: linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Sender: "Linuxppc-dev" On 2021-08-03, Daniel Thompson wrote: > On Tue, Aug 03, 2021 at 03:18:54PM +0206, John Ogness wrote: >> kgdb makes use of its own cpulock (@dbg_master_lock, @kgdb_active) >> during cpu roundup. This will conflict with the printk cpulock. > > When the full vision is realized what will be the purpose of the printk > cpulock? > > I'm asking largely because it's current role is actively unhelpful > w.r.t. kdb. It is possible that cautious use of in_dbg_master() might > be a better (and safer) solution. However it sounds like there is a > larger role planned for the printk cpulock... The printk cpulock is used as a synchronization mechanism for implementing atomic consoles, which need to be able to safely interrupt the console write() activity at any time and immediately continue with their own printing. The ultimate goal is to move all console printing into per-console dedicated kthreads, so the primary function of the printk cpulock is really to immediately _stop_ the CPU/kthread performing write() in order to allow write_atomic() (from any context on any CPU) to safely and reliably take over. Atomic consoles are actually quite similar to the kgdb_io ops. For example, comparing: serial8250_console_write_atomic() + serial8250_console_putchar_locked() with serial8250_put_poll_char() The difference is that serial8250_console_write_atomic() is line-based and synchronizing with serial8250_console_write() so that if the kernel crashes while outputing to the console, write() can be interrupted by write_atomic() and cleanly formatted crash data can be output. Also serial8250_put_poll_char() is calling into __pm_runtime_resume(), which includes a spinlock and possibly sleeping. This would not be acceptable for atomic consoles. Although, as Andy pointed out [0], I will need to figure out how to deal with suspended consoles. Or just implement a policy that registered atomic consoles may never be suspended. I had not considered merging kgdb_io ops with atomic console ops. But now that I look at it more closely, there may be some useful overlap. I will consider this. Thank you for this idea. >> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c >> index 3d0c933937b4..1b546e117f10 100644 >> --- a/kernel/printk/printk.c >> +++ b/kernel/printk/printk.c >> @@ -214,6 +215,7 @@ int devkmsg_sysctl_set_loglvl(struct ctl_table *table, int write, >> #ifdef CONFIG_SMP >> static atomic_t printk_cpulock_owner = ATOMIC_INIT(-1); >> static atomic_t printk_cpulock_nested = ATOMIC_INIT(0); >> +static unsigned int kgdb_cpu = -1; > > Is this the flag to provoke retriggering? It appears to be a write-only > variable (at least in this patch). How is it consumed? Critical catch! Thank you. I am quite unhappy to see these hunks were accidentally dropped when generating this series. @@ -3673,6 +3675,9 @@ EXPORT_SYMBOL(__printk_cpu_trylock); */ void __printk_cpu_unlock(void) { + bool trigger_kgdb = false; + unsigned int cpu; + if (atomic_read(&printk_cpulock_nested)) { atomic_dec(&printk_cpulock_nested); return; @@ -3683,6 +3688,12 @@ void __printk_cpu_unlock(void) * LMM(__printk_cpu_unlock:A) */ + cpu = smp_processor_id(); + if (kgdb_cpu == cpu) { + trigger_kgdb = true; + kgdb_cpu = -1; + } + /* * Guarantee loads and stores from this CPU when it was the * lock owner are visible to the next lock owner. This pairs @@ -3703,6 +3714,21 @@ void __printk_cpu_unlock(void) */ atomic_set_release(&printk_cpulock_owner, -1); /* LMM(__printk_cpu_unlock:B) */ + + if (trigger_kgdb) { + pr_warn("re-triggering kgdb roundup for CPU#%d\n", cpu); + kgdb_roundup_cpu(cpu); + } } EXPORT_SYMBOL(__printk_cpu_unlock); John Ogness [0] https://lore.kernel.org/lkml/YQlKAeXS9MPmE284@smile.fi.intel.com