linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/1] printk: suppress rcu stall warnings caused by slow
@ 2021-11-11 19:59 Wander Lairson Costa
  2021-11-11 19:59 ` [PATCH v2 1/1] printk: suppress rcu stall warnings caused by slow console devices Wander Lairson Costa
  0 siblings, 1 reply; 17+ messages in thread
From: Wander Lairson Costa @ 2021-11-11 19:59 UTC (permalink / raw)
  To: Petr Mladek, Sergey Senozhatsky, Steven Rostedt, John Ogness, open list
  Cc: Paul E . McKenney, Wander Lairson Costa

Ops, I generated the first version of the patch against the wrong
branch, which causes the patch failing to apply upstream.

Sorry about that.

Wander Lairson Costa (1):
  printk: suppress rcu stall warnings caused by slow console devices

 kernel/printk/printk.c | 24 +++++++++++++++++++++++-
 1 file changed, 23 insertions(+), 1 deletion(-)

-- 
2.27.0


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v2 1/1] printk: suppress rcu stall warnings caused by slow console devices
  2021-11-11 19:59 [PATCH v2 0/1] printk: suppress rcu stall warnings caused by slow Wander Lairson Costa
@ 2021-11-11 19:59 ` Wander Lairson Costa
  2021-11-11 21:13   ` Paul E. McKenney
                     ` (4 more replies)
  0 siblings, 5 replies; 17+ messages in thread
From: Wander Lairson Costa @ 2021-11-11 19:59 UTC (permalink / raw)
  To: Petr Mladek, Sergey Senozhatsky, Steven Rostedt, John Ogness, open list
  Cc: Paul E . McKenney, Wander Lairson Costa

If we have a reasonable large dataset to flush in the printk ring
buffer in the presence of a slow console device (like a serial port
with a low baud rate configured), the RCU stall detector may report
warnings.

This patch suppresses RCU stall warnings while flushing the ring buffer
to the console.

Signed-off-by: Wander Lairson Costa <wander@redhat.com>
---
 kernel/printk/printk.c | 24 +++++++++++++++++++++++-
 1 file changed, 23 insertions(+), 1 deletion(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index cbc35d586afb..42b622454314 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2588,6 +2588,20 @@ static inline int can_use_console(void)
 	return cpu_online(raw_smp_processor_id()) || have_callable_console();
 }
 
+extern int rcu_cpu_stall_suppress;
+
+static void rcu_console_stall_suppress(void)
+{
+	if (!rcu_cpu_stall_suppress)
+		rcu_cpu_stall_suppress = 4;
+}
+
+static void rcu_console_stall_unsuppress(void)
+{
+	if (rcu_cpu_stall_suppress == 4)
+		rcu_cpu_stall_suppress = 0;
+}
+
 /**
  * console_unlock - unlock the console system
  *
@@ -2634,6 +2648,9 @@ void console_unlock(void)
 	 * and cleared after the "again" goto label.
 	 */
 	do_cond_resched = console_may_schedule;
+
+	rcu_console_stall_suppress();
+
 again:
 	console_may_schedule = 0;
 
@@ -2645,6 +2662,7 @@ void console_unlock(void)
 	if (!can_use_console()) {
 		console_locked = 0;
 		up_console_sem();
+		rcu_console_stall_unsuppress();
 		return;
 	}
 
@@ -2716,8 +2734,10 @@ void console_unlock(void)
 
 		handover = console_lock_spinning_disable_and_check();
 		printk_safe_exit_irqrestore(flags);
-		if (handover)
+		if (handover) {
+			rcu_console_stall_unsuppress();
 			return;
+		}
 
 		if (do_cond_resched)
 			cond_resched();
@@ -2738,6 +2758,8 @@ void console_unlock(void)
 	retry = prb_read_valid(prb, next_seq, NULL);
 	if (retry && console_trylock())
 		goto again;
+
+	rcu_console_stall_unsuppress();
 }
 EXPORT_SYMBOL(console_unlock);
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 1/1] printk: suppress rcu stall warnings caused by slow console devices
  2021-11-11 19:59 ` [PATCH v2 1/1] printk: suppress rcu stall warnings caused by slow console devices Wander Lairson Costa
@ 2021-11-11 21:13   ` Paul E. McKenney
  2021-11-11 23:28   ` kernel test robot
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 17+ messages in thread
From: Paul E. McKenney @ 2021-11-11 21:13 UTC (permalink / raw)
  To: Wander Lairson Costa
  Cc: Petr Mladek, Sergey Senozhatsky, Steven Rostedt, John Ogness, open list

On Thu, Nov 11, 2021 at 04:59:04PM -0300, Wander Lairson Costa wrote:
> If we have a reasonable large dataset to flush in the printk ring
> buffer in the presence of a slow console device (like a serial port
> with a low baud rate configured), the RCU stall detector may report
> warnings.
> 
> This patch suppresses RCU stall warnings while flushing the ring buffer
> to the console.
> 
> Signed-off-by: Wander Lairson Costa <wander@redhat.com>

From an RCU perspective:

Acked-by: Paul E. McKenney <paulmck@kernel.org>

> ---
>  kernel/printk/printk.c | 24 +++++++++++++++++++++++-
>  1 file changed, 23 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index cbc35d586afb..42b622454314 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -2588,6 +2588,20 @@ static inline int can_use_console(void)
>  	return cpu_online(raw_smp_processor_id()) || have_callable_console();
>  }
>  
> +extern int rcu_cpu_stall_suppress;
> +
> +static void rcu_console_stall_suppress(void)
> +{
> +	if (!rcu_cpu_stall_suppress)
> +		rcu_cpu_stall_suppress = 4;
> +}
> +
> +static void rcu_console_stall_unsuppress(void)
> +{
> +	if (rcu_cpu_stall_suppress == 4)
> +		rcu_cpu_stall_suppress = 0;
> +}
> +
>  /**
>   * console_unlock - unlock the console system
>   *
> @@ -2634,6 +2648,9 @@ void console_unlock(void)
>  	 * and cleared after the "again" goto label.
>  	 */
>  	do_cond_resched = console_may_schedule;
> +
> +	rcu_console_stall_suppress();
> +
>  again:
>  	console_may_schedule = 0;
>  
> @@ -2645,6 +2662,7 @@ void console_unlock(void)
>  	if (!can_use_console()) {
>  		console_locked = 0;
>  		up_console_sem();
> +		rcu_console_stall_unsuppress();
>  		return;
>  	}
>  
> @@ -2716,8 +2734,10 @@ void console_unlock(void)
>  
>  		handover = console_lock_spinning_disable_and_check();
>  		printk_safe_exit_irqrestore(flags);
> -		if (handover)
> +		if (handover) {
> +			rcu_console_stall_unsuppress();
>  			return;
> +		}
>  
>  		if (do_cond_resched)
>  			cond_resched();
> @@ -2738,6 +2758,8 @@ void console_unlock(void)
>  	retry = prb_read_valid(prb, next_seq, NULL);
>  	if (retry && console_trylock())
>  		goto again;
> +
> +	rcu_console_stall_unsuppress();
>  }
>  EXPORT_SYMBOL(console_unlock);
>  
> -- 
> 2.27.0
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 1/1] printk: suppress rcu stall warnings caused by slow console devices
  2021-11-11 19:59 ` [PATCH v2 1/1] printk: suppress rcu stall warnings caused by slow console devices Wander Lairson Costa
  2021-11-11 21:13   ` Paul E. McKenney
@ 2021-11-11 23:28   ` kernel test robot
  2021-11-11 23:33   ` kernel test robot
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 17+ messages in thread
From: kernel test robot @ 2021-11-11 23:28 UTC (permalink / raw)
  To: Wander Lairson Costa, Petr Mladek, Sergey Senozhatsky,
	Steven Rostedt, John Ogness, linux-kernel
  Cc: kbuild-all, Paul E . McKenney, Wander Lairson Costa

[-- Attachment #1: Type: text/plain, Size: 2162 bytes --]

Hi Wander,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v5.15 next-20211111]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Wander-Lairson-Costa/printk-suppress-rcu-stall-warnings-caused-by-slow/20211112-040059
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git debe436e77c72fcee804fb867f275e6d31aa999c
config: m68k-defconfig (attached as .config)
compiler: m68k-linux-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/bb0661fcac8e6a708539fe465c8ad7f74ca920fa
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Wander-Lairson-Costa/printk-suppress-rcu-stall-warnings-caused-by-slow/20211112-040059
        git checkout bb0661fcac8e6a708539fe465c8ad7f74ca920fa
        # save the attached .config to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross O=build_dir ARCH=m68k SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   m68k-linux-ld: kernel/printk/printk.o: in function `console_unlock':
   printk.c:(.text+0x16f6): undefined reference to `rcu_cpu_stall_suppress'
>> m68k-linux-ld: printk.c:(.text+0x1890): undefined reference to `rcu_cpu_stall_suppress'
   m68k-linux-ld: printk.c:(.text+0x16ec): undefined reference to `rcu_cpu_stall_suppress'
   m68k-linux-ld: printk.c:(.text+0x1888): undefined reference to `rcu_cpu_stall_suppress'
   m68k-linux-ld: printk.c:(.text+0x1982): undefined reference to `rcu_cpu_stall_suppress'

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 17445 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 1/1] printk: suppress rcu stall warnings caused by slow console devices
  2021-11-11 19:59 ` [PATCH v2 1/1] printk: suppress rcu stall warnings caused by slow console devices Wander Lairson Costa
  2021-11-11 21:13   ` Paul E. McKenney
  2021-11-11 23:28   ` kernel test robot
@ 2021-11-11 23:33   ` kernel test robot
  2021-11-12  1:41   ` Sergey Senozhatsky
  2021-11-12  8:45   ` Petr Mladek
  4 siblings, 0 replies; 17+ messages in thread
From: kernel test robot @ 2021-11-11 23:33 UTC (permalink / raw)
  To: Wander Lairson Costa, Petr Mladek, Sergey Senozhatsky,
	Steven Rostedt, John Ogness, linux-kernel
  Cc: kbuild-all, Paul E . McKenney, Wander Lairson Costa

[-- Attachment #1: Type: text/plain, Size: 1889 bytes --]

Hi Wander,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v5.15 next-20211111]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Wander-Lairson-Costa/printk-suppress-rcu-stall-warnings-caused-by-slow/20211112-040059
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git debe436e77c72fcee804fb867f275e6d31aa999c
config: csky-defconfig (attached as .config)
compiler: csky-linux-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/bb0661fcac8e6a708539fe465c8ad7f74ca920fa
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Wander-Lairson-Costa/printk-suppress-rcu-stall-warnings-caused-by-slow/20211112-040059
        git checkout bb0661fcac8e6a708539fe465c8ad7f74ca920fa
        # save the attached .config to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross O=build_dir ARCH=csky SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   csky-linux-ld: kernel/printk/printk.o: in function `console_unlock':
   printk.c:(.text+0x1bf4): undefined reference to `rcu_cpu_stall_suppress'
>> csky-linux-ld: printk.c:(.text+0x1d78): undefined reference to `rcu_cpu_stall_suppress'

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 10458 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 1/1] printk: suppress rcu stall warnings caused by slow console devices
  2021-11-11 19:59 ` [PATCH v2 1/1] printk: suppress rcu stall warnings caused by slow console devices Wander Lairson Costa
                     ` (2 preceding siblings ...)
  2021-11-11 23:33   ` kernel test robot
@ 2021-11-12  1:41   ` Sergey Senozhatsky
  2021-11-12  2:23     ` Paul E. McKenney
  2021-11-12 14:42     ` Wander Costa
  2021-11-12  8:45   ` Petr Mladek
  4 siblings, 2 replies; 17+ messages in thread
From: Sergey Senozhatsky @ 2021-11-12  1:41 UTC (permalink / raw)
  To: Wander Lairson Costa
  Cc: Petr Mladek, Sergey Senozhatsky, Steven Rostedt, John Ogness,
	open list, Paul E . McKenney

On (21/11/11 16:59), Wander Lairson Costa wrote:
> 
> If we have a reasonable large dataset to flush in the printk ring
> buffer in the presence of a slow console device (like a serial port
> with a low baud rate configured), the RCU stall detector may report
> warnings.
> 
> This patch suppresses RCU stall warnings while flushing the ring buffer
> to the console.
> 
[..]
> +extern int rcu_cpu_stall_suppress;
> +
> +static void rcu_console_stall_suppress(void)
> +{
> +	if (!rcu_cpu_stall_suppress)
> +		rcu_cpu_stall_suppress = 4;
> +}
> +
> +static void rcu_console_stall_unsuppress(void)
> +{
> +	if (rcu_cpu_stall_suppress == 4)
> +		rcu_cpu_stall_suppress = 0;
> +}
> +
>  /**
>   * console_unlock - unlock the console system
>   *
> @@ -2634,6 +2648,9 @@ void console_unlock(void)
>  	 * and cleared after the "again" goto label.
>  	 */
>  	do_cond_resched = console_may_schedule;
> +
> +	rcu_console_stall_suppress();
> +
>  again:
>  	console_may_schedule = 0;
>  
> @@ -2645,6 +2662,7 @@ void console_unlock(void)
>  	if (!can_use_console()) {
>  		console_locked = 0;
>  		up_console_sem();
> +		rcu_console_stall_unsuppress();
>  		return;
>  	}
>  
> @@ -2716,8 +2734,10 @@ void console_unlock(void)
>  
>  		handover = console_lock_spinning_disable_and_check();
>  		printk_safe_exit_irqrestore(flags);
> -		if (handover)
> +		if (handover) {
> +			rcu_console_stall_unsuppress();
>  			return;
> +		}
>  
>  		if (do_cond_resched)
>  			cond_resched();
> @@ -2738,6 +2758,8 @@ void console_unlock(void)
>  	retry = prb_read_valid(prb, next_seq, NULL);
>  	if (retry && console_trylock())
>  		goto again;
> +
> +	rcu_console_stall_unsuppress();
>  }

May be we can just start touching watchdogs from printing routine?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 1/1] printk: suppress rcu stall warnings caused by slow console devices
  2021-11-12  1:41   ` Sergey Senozhatsky
@ 2021-11-12  2:23     ` Paul E. McKenney
  2021-11-12 14:42     ` Wander Costa
  1 sibling, 0 replies; 17+ messages in thread
From: Paul E. McKenney @ 2021-11-12  2:23 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Wander Lairson Costa, Petr Mladek, Steven Rostedt, John Ogness,
	open list

On Fri, Nov 12, 2021 at 10:41:55AM +0900, Sergey Senozhatsky wrote:
> On (21/11/11 16:59), Wander Lairson Costa wrote:
> > 
> > If we have a reasonable large dataset to flush in the printk ring
> > buffer in the presence of a slow console device (like a serial port
> > with a low baud rate configured), the RCU stall detector may report
> > warnings.
> > 
> > This patch suppresses RCU stall warnings while flushing the ring buffer
> > to the console.
> > 
> [..]
> > +extern int rcu_cpu_stall_suppress;
> > +
> > +static void rcu_console_stall_suppress(void)
> > +{
> > +	if (!rcu_cpu_stall_suppress)
> > +		rcu_cpu_stall_suppress = 4;
> > +}
> > +
> > +static void rcu_console_stall_unsuppress(void)
> > +{
> > +	if (rcu_cpu_stall_suppress == 4)
> > +		rcu_cpu_stall_suppress = 0;
> > +}
> > +
> >  /**
> >   * console_unlock - unlock the console system
> >   *
> > @@ -2634,6 +2648,9 @@ void console_unlock(void)
> >  	 * and cleared after the "again" goto label.
> >  	 */
> >  	do_cond_resched = console_may_schedule;
> > +
> > +	rcu_console_stall_suppress();
> > +
> >  again:
> >  	console_may_schedule = 0;
> >  
> > @@ -2645,6 +2662,7 @@ void console_unlock(void)
> >  	if (!can_use_console()) {
> >  		console_locked = 0;
> >  		up_console_sem();
> > +		rcu_console_stall_unsuppress();
> >  		return;
> >  	}
> >  
> > @@ -2716,8 +2734,10 @@ void console_unlock(void)
> >  
> >  		handover = console_lock_spinning_disable_and_check();
> >  		printk_safe_exit_irqrestore(flags);
> > -		if (handover)
> > +		if (handover) {
> > +			rcu_console_stall_unsuppress();
> >  			return;
> > +		}
> >  
> >  		if (do_cond_resched)
> >  			cond_resched();
> > @@ -2738,6 +2758,8 @@ void console_unlock(void)
> >  	retry = prb_read_valid(prb, next_seq, NULL);
> >  	if (retry && console_trylock())
> >  		goto again;
> > +
> > +	rcu_console_stall_unsuppress();
> >  }
> 
> May be we can just start touching watchdogs from printing routine?

You could invoke cond_resched() periodically and keep RCU happy.

But if you also get stall warnings in PREEMPT=y kernels, something
more is required.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 1/1] printk: suppress rcu stall warnings caused by slow console devices
  2021-11-11 19:59 ` [PATCH v2 1/1] printk: suppress rcu stall warnings caused by slow console devices Wander Lairson Costa
                     ` (3 preceding siblings ...)
  2021-11-12  1:41   ` Sergey Senozhatsky
@ 2021-11-12  8:45   ` Petr Mladek
  2021-11-12 14:08     ` Wander Costa
  4 siblings, 1 reply; 17+ messages in thread
From: Petr Mladek @ 2021-11-12  8:45 UTC (permalink / raw)
  To: Wander Lairson Costa
  Cc: Sergey Senozhatsky, Steven Rostedt, John Ogness, open list,
	Paul E . McKenney

On Thu 2021-11-11 16:59:04, Wander Lairson Costa wrote:
> If we have a reasonable large dataset to flush in the printk ring
> buffer in the presence of a slow console device (like a serial port
> with a low baud rate configured), the RCU stall detector may report
> warnings.
> 
> This patch suppresses RCU stall warnings while flushing the ring buffer
> to the console.

I have mixed feelings about this patch.

We already touch watchdogs in several situations to avoid this type
of reports. So, there is a precedent.

On the other hand, it hides a real problem. printk() might cause
softlockups, livelockups, and even deadlocks. It might really stall
other processes and CPUs. And the stall report might help to
understand that the system is not longer responsive because of
printk().

John Ogness is working on the proper solution, handling consoles
in a kthread. It is not that easy because we need to handle situations
when kthreads do not work, for example, during early boot, panic(),
suspend, reboot, kexec. But I believe that we will have it rather
sooner than later.

A workaround, is to lower console_loglevel and show only the most
important messages. Sometimes, a reasonable solution is to ratelimit
repeated messages.

Which brings the question. What is the motivation for this patch,
please?

Is it motivated by a particular bug report?
Or does the experience shows that this report causes more harm than
good?

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 1/1] printk: suppress rcu stall warnings caused by slow console devices
  2021-11-12  8:45   ` Petr Mladek
@ 2021-11-12 14:08     ` Wander Costa
  2021-11-15 10:10       ` Petr Mladek
  0 siblings, 1 reply; 17+ messages in thread
From: Wander Costa @ 2021-11-12 14:08 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Wander Lairson Costa, Sergey Senozhatsky, Steven Rostedt,
	John Ogness, open list, Paul E . McKenney

On Fri, Nov 12, 2021 at 5:45 AM Petr Mladek <pmladek@suse.com> wrote:
>
> On Thu 2021-11-11 16:59:04, Wander Lairson Costa wrote:
> > If we have a reasonable large dataset to flush in the printk ring
> > buffer in the presence of a slow console device (like a serial port
> > with a low baud rate configured), the RCU stall detector may report
> > warnings.
> >
> > This patch suppresses RCU stall warnings while flushing the ring buffer
> > to the console.
>
> I have mixed feelings about this patch.
>
> We already touch watchdogs in several situations to avoid this type
> of reports. So, there is a precedent.
>
> On the other hand, it hides a real problem. printk() might cause
> softlockups, livelockups, and even deadlocks. It might really stall
> other processes and CPUs. And the stall report might help to
> understand that the system is not longer responsive because of
> printk().
>
Indeed, other use cases might care about these stalls in printk.
One workaround is to create a syctl to make this feature optional.

> John Ogness is working on the proper solution, handling consoles
> in a kthread. It is not that easy because we need to handle situations
> when kthreads do not work, for example, during early boot, panic(),
> suspend, reboot, kexec. But I believe that we will have it rather
> sooner than later.
>
I gave a try to the print threads patch set and it does solve the
problem. I am not
opposite at all to wait for them to land if it happens in the
foreseeable future.

> A workaround, is to lower console_loglevel and show only the most
> important messages. Sometimes, a reasonable solution is to ratelimit
> repeated messages.
>
> Which brings the question. What is the motivation for this patch,
> please?
>
> Is it motivated by a particular bug report?
> Or does the experience shows that this report causes more harm than
> good?
>
QA has a test case in which they need to load hundreds of SCSI devices,
and they simulate it using the scsi_debug driver:

modprobe scsi_debug virtual_gb=1 add_host=2 num_tgts=600

This dumps a bunch of messages to print and the serial console driver
cannot keep up with the data rate, causing an RCU stall. The stall is reported
in an IRQ context, then the ring buffer flush continues from there,
and then it causes
a soft lockup.

> Best Regards,
> Petr
>

On Fri, Nov 12, 2021 at 5:45 AM Petr Mladek <pmladek@suse.com> wrote:
>
> On Thu 2021-11-11 16:59:04, Wander Lairson Costa wrote:
> > If we have a reasonable large dataset to flush in the printk ring
> > buffer in the presence of a slow console device (like a serial port
> > with a low baud rate configured), the RCU stall detector may report
> > warnings.
> >
> > This patch suppresses RCU stall warnings while flushing the ring buffer
> > to the console.
>
> I have mixed feelings about this patch.
>
> We already touch watchdogs in several situations to avoid this type
> of reports. So, there is a precedent.
>
> On the other hand, it hides a real problem. printk() might cause
> softlockups, livelockups, and even deadlocks. It might really stall
> other processes and CPUs. And the stall report might help to
> understand that the system is not longer responsive because of
> printk().
>
> John Ogness is working on the proper solution, handling consoles
> in a kthread. It is not that easy because we need to handle situations
> when kthreads do not work, for example, during early boot, panic(),
> suspend, reboot, kexec. But I believe that we will have it rather
> sooner than later.
>
> A workaround, is to lower console_loglevel and show only the most
> important messages. Sometimes, a reasonable solution is to ratelimit
> repeated messages.
>
> Which brings the question. What is the motivation for this patch,
> please?
>
> Is it motivated by a particular bug report?
> Or does the experience shows that this report causes more harm than
> good?
>
> Best Regards,
> Petr
>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 1/1] printk: suppress rcu stall warnings caused by slow console devices
  2021-11-12  1:41   ` Sergey Senozhatsky
  2021-11-12  2:23     ` Paul E. McKenney
@ 2021-11-12 14:42     ` Wander Costa
  2021-11-12 14:57       ` Paul E. McKenney
  1 sibling, 1 reply; 17+ messages in thread
From: Wander Costa @ 2021-11-12 14:42 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Wander Lairson Costa, Petr Mladek, Steven Rostedt, John Ogness,
	open list, Paul E . McKenney

On Thu, Nov 11, 2021 at 10:42 PM Sergey Senozhatsky
<senozhatsky@chromium.org> wrote:
>
> On (21/11/11 16:59), Wander Lairson Costa wrote:
> >
> > If we have a reasonable large dataset to flush in the printk ring
> > buffer in the presence of a slow console device (like a serial port
> > with a low baud rate configured), the RCU stall detector may report
> > warnings.
> >
> > This patch suppresses RCU stall warnings while flushing the ring buffer
> > to the console.
> >
> [..]
> > +extern int rcu_cpu_stall_suppress;
> > +
> > +static void rcu_console_stall_suppress(void)
> > +{
> > +     if (!rcu_cpu_stall_suppress)
> > +             rcu_cpu_stall_suppress = 4;
> > +}
> > +
> > +static void rcu_console_stall_unsuppress(void)
> > +{
> > +     if (rcu_cpu_stall_suppress == 4)
> > +             rcu_cpu_stall_suppress = 0;
> > +}
> > +
> >  /**
> >   * console_unlock - unlock the console system
> >   *
> > @@ -2634,6 +2648,9 @@ void console_unlock(void)
> >        * and cleared after the "again" goto label.
> >        */
> >       do_cond_resched = console_may_schedule;
> > +
> > +     rcu_console_stall_suppress();
> > +
> >  again:
> >       console_may_schedule = 0;
> >
> > @@ -2645,6 +2662,7 @@ void console_unlock(void)
> >       if (!can_use_console()) {
> >               console_locked = 0;
> >               up_console_sem();
> > +             rcu_console_stall_unsuppress();
> >               return;
> >       }
> >
> > @@ -2716,8 +2734,10 @@ void console_unlock(void)
> >
> >               handover = console_lock_spinning_disable_and_check();
> >               printk_safe_exit_irqrestore(flags);
> > -             if (handover)
> > +             if (handover) {
> > +                     rcu_console_stall_unsuppress();
> >                       return;
> > +             }
> >
> >               if (do_cond_resched)
> >                       cond_resched();
> > @@ -2738,6 +2758,8 @@ void console_unlock(void)
> >       retry = prb_read_valid(prb, next_seq, NULL);
> >       if (retry && console_trylock())
> >               goto again;
> > +
> > +     rcu_console_stall_unsuppress();
> >  }
>
> May be we can just start touching watchdogs from printing routine?
>
Hrm, console_unlock is called from vprintk_emit [0] with preemption
disabled. and it already has the logic implemented to call
cond_resched when possible [1].

[0] https://elixir.bootlin.com/linux/latest/source/kernel/printk/printk.c#L2244
[1] https://elixir.bootlin.com/linux/latest/source/kernel/printk/printk.c#L2719


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 1/1] printk: suppress rcu stall warnings caused by slow console devices
  2021-11-12 14:42     ` Wander Costa
@ 2021-11-12 14:57       ` Paul E. McKenney
  2021-11-12 14:58         ` Wander Costa
  2022-01-07 18:53         ` Paul E. McKenney
  0 siblings, 2 replies; 17+ messages in thread
From: Paul E. McKenney @ 2021-11-12 14:57 UTC (permalink / raw)
  To: Wander Costa
  Cc: Sergey Senozhatsky, Wander Lairson Costa, Petr Mladek,
	Steven Rostedt, John Ogness, open list

On Fri, Nov 12, 2021 at 11:42:39AM -0300, Wander Costa wrote:
> On Thu, Nov 11, 2021 at 10:42 PM Sergey Senozhatsky
> <senozhatsky@chromium.org> wrote:
> >
> > On (21/11/11 16:59), Wander Lairson Costa wrote:
> > >
> > > If we have a reasonable large dataset to flush in the printk ring
> > > buffer in the presence of a slow console device (like a serial port
> > > with a low baud rate configured), the RCU stall detector may report
> > > warnings.
> > >
> > > This patch suppresses RCU stall warnings while flushing the ring buffer
> > > to the console.
> > >
> > [..]
> > > +extern int rcu_cpu_stall_suppress;
> > > +
> > > +static void rcu_console_stall_suppress(void)
> > > +{
> > > +     if (!rcu_cpu_stall_suppress)
> > > +             rcu_cpu_stall_suppress = 4;
> > > +}
> > > +
> > > +static void rcu_console_stall_unsuppress(void)
> > > +{
> > > +     if (rcu_cpu_stall_suppress == 4)
> > > +             rcu_cpu_stall_suppress = 0;
> > > +}
> > > +
> > >  /**
> > >   * console_unlock - unlock the console system
> > >   *
> > > @@ -2634,6 +2648,9 @@ void console_unlock(void)
> > >        * and cleared after the "again" goto label.
> > >        */
> > >       do_cond_resched = console_may_schedule;
> > > +
> > > +     rcu_console_stall_suppress();
> > > +
> > >  again:
> > >       console_may_schedule = 0;
> > >
> > > @@ -2645,6 +2662,7 @@ void console_unlock(void)
> > >       if (!can_use_console()) {
> > >               console_locked = 0;
> > >               up_console_sem();
> > > +             rcu_console_stall_unsuppress();
> > >               return;
> > >       }
> > >
> > > @@ -2716,8 +2734,10 @@ void console_unlock(void)
> > >
> > >               handover = console_lock_spinning_disable_and_check();
> > >               printk_safe_exit_irqrestore(flags);
> > > -             if (handover)
> > > +             if (handover) {
> > > +                     rcu_console_stall_unsuppress();
> > >                       return;
> > > +             }
> > >
> > >               if (do_cond_resched)
> > >                       cond_resched();
> > > @@ -2738,6 +2758,8 @@ void console_unlock(void)
> > >       retry = prb_read_valid(prb, next_seq, NULL);
> > >       if (retry && console_trylock())
> > >               goto again;
> > > +
> > > +     rcu_console_stall_unsuppress();
> > >  }
> >
> > May be we can just start touching watchdogs from printing routine?
> >
> Hrm, console_unlock is called from vprintk_emit [0] with preemption
> disabled. and it already has the logic implemented to call
> cond_resched when possible [1].
> 
> [0] https://elixir.bootlin.com/linux/latest/source/kernel/printk/printk.c#L2244
> [1] https://elixir.bootlin.com/linux/latest/source/kernel/printk/printk.c#L2719

So when we are having problems is when console_may_schedule == 0?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 1/1] printk: suppress rcu stall warnings caused by slow console devices
  2021-11-12 14:57       ` Paul E. McKenney
@ 2021-11-12 14:58         ` Wander Costa
  2022-01-07 18:53         ` Paul E. McKenney
  1 sibling, 0 replies; 17+ messages in thread
From: Wander Costa @ 2021-11-12 14:58 UTC (permalink / raw)
  To: Paul E . McKenney
  Cc: Sergey Senozhatsky, Wander Lairson Costa, Petr Mladek,
	Steven Rostedt, John Ogness, open list

On Fri, Nov 12, 2021 at 11:58 AM Paul E. McKenney <paulmck@kernel.org> wrote:
>
> On Fri, Nov 12, 2021 at 11:42:39AM -0300, Wander Costa wrote:
> > On Thu, Nov 11, 2021 at 10:42 PM Sergey Senozhatsky
> > <senozhatsky@chromium.org> wrote:
> > >
> > > On (21/11/11 16:59), Wander Lairson Costa wrote:
> > > >
> > > > If we have a reasonable large dataset to flush in the printk ring
> > > > buffer in the presence of a slow console device (like a serial port
> > > > with a low baud rate configured), the RCU stall detector may report
> > > > warnings.
> > > >
> > > > This patch suppresses RCU stall warnings while flushing the ring buffer
> > > > to the console.
> > > >
> > > [..]
> > > > +extern int rcu_cpu_stall_suppress;
> > > > +
> > > > +static void rcu_console_stall_suppress(void)
> > > > +{
> > > > +     if (!rcu_cpu_stall_suppress)
> > > > +             rcu_cpu_stall_suppress = 4;
> > > > +}
> > > > +
> > > > +static void rcu_console_stall_unsuppress(void)
> > > > +{
> > > > +     if (rcu_cpu_stall_suppress == 4)
> > > > +             rcu_cpu_stall_suppress = 0;
> > > > +}
> > > > +
> > > >  /**
> > > >   * console_unlock - unlock the console system
> > > >   *
> > > > @@ -2634,6 +2648,9 @@ void console_unlock(void)
> > > >        * and cleared after the "again" goto label.
> > > >        */
> > > >       do_cond_resched = console_may_schedule;
> > > > +
> > > > +     rcu_console_stall_suppress();
> > > > +
> > > >  again:
> > > >       console_may_schedule = 0;
> > > >
> > > > @@ -2645,6 +2662,7 @@ void console_unlock(void)
> > > >       if (!can_use_console()) {
> > > >               console_locked = 0;
> > > >               up_console_sem();
> > > > +             rcu_console_stall_unsuppress();
> > > >               return;
> > > >       }
> > > >
> > > > @@ -2716,8 +2734,10 @@ void console_unlock(void)
> > > >
> > > >               handover = console_lock_spinning_disable_and_check();
> > > >               printk_safe_exit_irqrestore(flags);
> > > > -             if (handover)
> > > > +             if (handover) {
> > > > +                     rcu_console_stall_unsuppress();
> > > >                       return;
> > > > +             }
> > > >
> > > >               if (do_cond_resched)
> > > >                       cond_resched();
> > > > @@ -2738,6 +2758,8 @@ void console_unlock(void)
> > > >       retry = prb_read_valid(prb, next_seq, NULL);
> > > >       if (retry && console_trylock())
> > > >               goto again;
> > > > +
> > > > +     rcu_console_stall_unsuppress();
> > > >  }
> > >
> > > May be we can just start touching watchdogs from printing routine?
> > >
> > Hrm, console_unlock is called from vprintk_emit [0] with preemption
> > disabled. and it already has the logic implemented to call
> > cond_resched when possible [1].
> >
> > [0] https://elixir.bootlin.com/linux/latest/source/kernel/printk/printk.c#L2244
> > [1] https://elixir.bootlin.com/linux/latest/source/kernel/printk/printk.c#L2719
>
> So when we are having problems is when console_may_schedule == 0?
>
Exactly.
>                                                         Thanx, Paul
>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 1/1] printk: suppress rcu stall warnings caused by slow console devices
  2021-11-12 14:08     ` Wander Costa
@ 2021-11-15 10:10       ` Petr Mladek
  0 siblings, 0 replies; 17+ messages in thread
From: Petr Mladek @ 2021-11-15 10:10 UTC (permalink / raw)
  To: Wander Costa
  Cc: Wander Lairson Costa, Sergey Senozhatsky, Steven Rostedt,
	John Ogness, open list, Paul E . McKenney

On Fri 2021-11-12 11:08:33, Wander Costa wrote:
> On Fri, Nov 12, 2021 at 5:45 AM Petr Mladek <pmladek@suse.com> wrote:
> > A workaround, is to lower console_loglevel and show only the most
> > important messages. Sometimes, a reasonable solution is to ratelimit
> > repeated messages.
> >
> > Which brings the question. What is the motivation for this patch,
> > please?
> >
> > Is it motivated by a particular bug report?
> > Or does the experience shows that this report causes more harm than
> > good?
> >
> QA has a test case in which they need to load hundreds of SCSI devices,
> and they simulate it using the scsi_debug driver:

I think that SCSI devices were the first sinner who motivated the work
on console offloading here at SUSE.

> modprobe scsi_debug virtual_gb=1 add_host=2 num_tgts=600
> 
> This dumps a bunch of messages to print and the serial console driver
> cannot keep up with the data rate, causing an RCU stall. The stall is reported
> in an IRQ context, then the ring buffer flush continues from there,
> and then it causes
> a soft lockup.

I usually suggest to reduce console_loglevel as a temporary solution.
But I am not sure if it is acceptable in QA.

It might be done only around this test. I mean something like:

CONSOLE_LOGLEVEL=`cat /proc/sys/kernel/printk`
IGNORE_LOGLEVEL=`cat /sys/module/printk/parameters/ignore_loglevel`
echo "3 4 1 7" >/proc/sys/kernel/printk
echo N >/sys/module/printk/parameters/ignore_loglevel

modprobe scsi_debug virtual_gb=1 add_host=2 num_tgts=600

echo $CONSOLE_LOGLEVEL >/proc/sys/kernel/printk
echo $IGNORE_LOGLEVEL  >/sys/module/printk/parameters/ignore_loglevel


Where /proc/sys/kernel/printk is a horrible interface. The first
number is important. It defines the limit used for filtering messages.
The levels are defined in include/linux/kern_levels.h.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 1/1] printk: suppress rcu stall warnings caused by slow console devices
  2021-11-12 14:57       ` Paul E. McKenney
  2021-11-12 14:58         ` Wander Costa
@ 2022-01-07 18:53         ` Paul E. McKenney
  2022-01-07 20:16           ` Wander Costa
  1 sibling, 1 reply; 17+ messages in thread
From: Paul E. McKenney @ 2022-01-07 18:53 UTC (permalink / raw)
  To: Wander Costa
  Cc: Sergey Senozhatsky, Wander Lairson Costa, Petr Mladek,
	Steven Rostedt, John Ogness, open list

On Fri, Nov 12, 2021 at 06:57:55AM -0800, Paul E. McKenney wrote:
> On Fri, Nov 12, 2021 at 11:42:39AM -0300, Wander Costa wrote:
> > On Thu, Nov 11, 2021 at 10:42 PM Sergey Senozhatsky
> > <senozhatsky@chromium.org> wrote:
> > >
> > > On (21/11/11 16:59), Wander Lairson Costa wrote:
> > > >
> > > > If we have a reasonable large dataset to flush in the printk ring
> > > > buffer in the presence of a slow console device (like a serial port
> > > > with a low baud rate configured), the RCU stall detector may report
> > > > warnings.
> > > >
> > > > This patch suppresses RCU stall warnings while flushing the ring buffer
> > > > to the console.
> > > >
> > > [..]
> > > > +extern int rcu_cpu_stall_suppress;
> > > > +
> > > > +static void rcu_console_stall_suppress(void)
> > > > +{
> > > > +     if (!rcu_cpu_stall_suppress)
> > > > +             rcu_cpu_stall_suppress = 4;
> > > > +}
> > > > +
> > > > +static void rcu_console_stall_unsuppress(void)
> > > > +{
> > > > +     if (rcu_cpu_stall_suppress == 4)
> > > > +             rcu_cpu_stall_suppress = 0;
> > > > +}
> > > > +
> > > >  /**
> > > >   * console_unlock - unlock the console system
> > > >   *
> > > > @@ -2634,6 +2648,9 @@ void console_unlock(void)
> > > >        * and cleared after the "again" goto label.
> > > >        */
> > > >       do_cond_resched = console_may_schedule;
> > > > +
> > > > +     rcu_console_stall_suppress();
> > > > +
> > > >  again:
> > > >       console_may_schedule = 0;
> > > >
> > > > @@ -2645,6 +2662,7 @@ void console_unlock(void)
> > > >       if (!can_use_console()) {
> > > >               console_locked = 0;
> > > >               up_console_sem();
> > > > +             rcu_console_stall_unsuppress();
> > > >               return;
> > > >       }
> > > >
> > > > @@ -2716,8 +2734,10 @@ void console_unlock(void)
> > > >
> > > >               handover = console_lock_spinning_disable_and_check();
> > > >               printk_safe_exit_irqrestore(flags);
> > > > -             if (handover)
> > > > +             if (handover) {
> > > > +                     rcu_console_stall_unsuppress();
> > > >                       return;
> > > > +             }
> > > >
> > > >               if (do_cond_resched)
> > > >                       cond_resched();
> > > > @@ -2738,6 +2758,8 @@ void console_unlock(void)
> > > >       retry = prb_read_valid(prb, next_seq, NULL);
> > > >       if (retry && console_trylock())
> > > >               goto again;
> > > > +
> > > > +     rcu_console_stall_unsuppress();
> > > >  }
> > >
> > > May be we can just start touching watchdogs from printing routine?
> > >
> > Hrm, console_unlock is called from vprintk_emit [0] with preemption
> > disabled. and it already has the logic implemented to call
> > cond_resched when possible [1].
> > 
> > [0] https://elixir.bootlin.com/linux/latest/source/kernel/printk/printk.c#L2244
> > [1] https://elixir.bootlin.com/linux/latest/source/kernel/printk/printk.c#L2719
> 
> So when we are having problems is when console_may_schedule == 0?

Just following up...  Any progress on this?  The ability to suppress RCU
CPU stall warnings due to console slowness would likely be valuable to
quite a few people.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 1/1] printk: suppress rcu stall warnings caused by slow console devices
  2022-01-07 18:53         ` Paul E. McKenney
@ 2022-01-07 20:16           ` Wander Costa
  2022-01-10 11:10             ` Petr Mladek
  0 siblings, 1 reply; 17+ messages in thread
From: Wander Costa @ 2022-01-07 20:16 UTC (permalink / raw)
  To: Paul E . McKenney
  Cc: Sergey Senozhatsky, Wander Lairson Costa, Petr Mladek,
	Steven Rostedt, John Ogness, open list

On Fri, Jan 7, 2022 at 4:03 PM Paul E. McKenney <paulmck@kernel.org> wrote:
>
> On Fri, Nov 12, 2021 at 06:57:55AM -0800, Paul E. McKenney wrote:
> > On Fri, Nov 12, 2021 at 11:42:39AM -0300, Wander Costa wrote:
> > > On Thu, Nov 11, 2021 at 10:42 PM Sergey Senozhatsky
> > > <senozhatsky@chromium.org> wrote:
> > > >
> > > > On (21/11/11 16:59), Wander Lairson Costa wrote:
> > > > >
> > > > > If we have a reasonable large dataset to flush in the printk ring
> > > > > buffer in the presence of a slow console device (like a serial port
> > > > > with a low baud rate configured), the RCU stall detector may report
> > > > > warnings.
> > > > >
> > > > > This patch suppresses RCU stall warnings while flushing the ring buffer
> > > > > to the console.
> > > > >
> > > > [..]
> > > > > +extern int rcu_cpu_stall_suppress;
> > > > > +
> > > > > +static void rcu_console_stall_suppress(void)
> > > > > +{
> > > > > +     if (!rcu_cpu_stall_suppress)
> > > > > +             rcu_cpu_stall_suppress = 4;
> > > > > +}
> > > > > +
> > > > > +static void rcu_console_stall_unsuppress(void)
> > > > > +{
> > > > > +     if (rcu_cpu_stall_suppress == 4)
> > > > > +             rcu_cpu_stall_suppress = 0;
> > > > > +}
> > > > > +
> > > > >  /**
> > > > >   * console_unlock - unlock the console system
> > > > >   *
> > > > > @@ -2634,6 +2648,9 @@ void console_unlock(void)
> > > > >        * and cleared after the "again" goto label.
> > > > >        */
> > > > >       do_cond_resched = console_may_schedule;
> > > > > +
> > > > > +     rcu_console_stall_suppress();
> > > > > +
> > > > >  again:
> > > > >       console_may_schedule = 0;
> > > > >
> > > > > @@ -2645,6 +2662,7 @@ void console_unlock(void)
> > > > >       if (!can_use_console()) {
> > > > >               console_locked = 0;
> > > > >               up_console_sem();
> > > > > +             rcu_console_stall_unsuppress();
> > > > >               return;
> > > > >       }
> > > > >
> > > > > @@ -2716,8 +2734,10 @@ void console_unlock(void)
> > > > >
> > > > >               handover = console_lock_spinning_disable_and_check();
> > > > >               printk_safe_exit_irqrestore(flags);
> > > > > -             if (handover)
> > > > > +             if (handover) {
> > > > > +                     rcu_console_stall_unsuppress();
> > > > >                       return;
> > > > > +             }
> > > > >
> > > > >               if (do_cond_resched)
> > > > >                       cond_resched();
> > > > > @@ -2738,6 +2758,8 @@ void console_unlock(void)
> > > > >       retry = prb_read_valid(prb, next_seq, NULL);
> > > > >       if (retry && console_trylock())
> > > > >               goto again;
> > > > > +
> > > > > +     rcu_console_stall_unsuppress();
> > > > >  }
> > > >
> > > > May be we can just start touching watchdogs from printing routine?
> > > >
> > > Hrm, console_unlock is called from vprintk_emit [0] with preemption
> > > disabled. and it already has the logic implemented to call
> > > cond_resched when possible [1].
> > >
> > > [0] https://elixir.bootlin.com/linux/latest/source/kernel/printk/printk.c#L2244
> > > [1] https://elixir.bootlin.com/linux/latest/source/kernel/printk/printk.c#L2719
> >
> > So when we are having problems is when console_may_schedule == 0?
>
> Just following up...  Any progress on this?  The ability to suppress RCU
> CPU stall warnings due to console slowness would likely be valuable to
> quite a few people.
>

My understanding is that the consensus is that the proper fix is the
printk threads currently under work and it wouldn't take long before
it is ready to review.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 1/1] printk: suppress rcu stall warnings caused by slow console devices
  2022-01-07 20:16           ` Wander Costa
@ 2022-01-10 11:10             ` Petr Mladek
  2022-01-17 11:26               ` Wander Costa
  0 siblings, 1 reply; 17+ messages in thread
From: Petr Mladek @ 2022-01-10 11:10 UTC (permalink / raw)
  To: Wander Costa
  Cc: Paul E . McKenney, Sergey Senozhatsky, Wander Lairson Costa,
	Steven Rostedt, John Ogness, open list

On Fri 2022-01-07 17:16:23, Wander Costa wrote:
> On Fri, Jan 7, 2022 at 4:03 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> >
> > On Fri, Nov 12, 2021 at 06:57:55AM -0800, Paul E. McKenney wrote:
> > > On Fri, Nov 12, 2021 at 11:42:39AM -0300, Wander Costa wrote:
> > > > On Thu, Nov 11, 2021 at 10:42 PM Sergey Senozhatsky
> > > > <senozhatsky@chromium.org> wrote:
> > > > >
> > > > > On (21/11/11 16:59), Wander Lairson Costa wrote:
> > > > > >
> > > > > > If we have a reasonable large dataset to flush in the printk ring
> > > > > > buffer in the presence of a slow console device (like a serial port
> > > > > > with a low baud rate configured), the RCU stall detector may report
> > > > > > warnings.
> > > > > >
> > > > > > This patch suppresses RCU stall warnings while flushing the ring buffer
> > > > > > to the console.
> > > > > >
> > > > > [..]
> > > > > > +extern int rcu_cpu_stall_suppress;
> > > > > > +
> > > > > > +static void rcu_console_stall_suppress(void)
> > > > > > +{
> > > > > > +     if (!rcu_cpu_stall_suppress)
> > > > > > +             rcu_cpu_stall_suppress = 4;
> > > > > > +}
> > > > > > +
> > > > > > +static void rcu_console_stall_unsuppress(void)
> > > > > > +{
> > > > > > +     if (rcu_cpu_stall_suppress == 4)
> > > > > > +             rcu_cpu_stall_suppress = 0;
> > > > > > +}
> > > > > > +
> > > > > >  /**
> > > > > >   * console_unlock - unlock the console system
> > > > > >   *
> > > > > > @@ -2634,6 +2648,9 @@ void console_unlock(void)
> > > > > >        * and cleared after the "again" goto label.
> > > > > >        */
> > > > > >       do_cond_resched = console_may_schedule;
> > > > > > +
> > > > > > +     rcu_console_stall_suppress();
> > > > > > +
> > > > > >  again:
> > > > > >       console_may_schedule = 0;
> > > > > >
> > > > > > @@ -2645,6 +2662,7 @@ void console_unlock(void)
> > > > > >       if (!can_use_console()) {
> > > > > >               console_locked = 0;
> > > > > >               up_console_sem();
> > > > > > +             rcu_console_stall_unsuppress();
> > > > > >               return;
> > > > > >       }
> > > > > >
> > > > > > @@ -2716,8 +2734,10 @@ void console_unlock(void)
> > > > > >
> > > > > >               handover = console_lock_spinning_disable_and_check();
> > > > > >               printk_safe_exit_irqrestore(flags);
> > > > > > -             if (handover)
> > > > > > +             if (handover) {
> > > > > > +                     rcu_console_stall_unsuppress();
> > > > > >                       return;
> > > > > > +             }
> > > > > >
> > > > > >               if (do_cond_resched)
> > > > > >                       cond_resched();
> > > > > > @@ -2738,6 +2758,8 @@ void console_unlock(void)
> > > > > >       retry = prb_read_valid(prb, next_seq, NULL);
> > > > > >       if (retry && console_trylock())
> > > > > >               goto again;
> > > > > > +
> > > > > > +     rcu_console_stall_unsuppress();
> > > > > >  }
> > > > >
> > > > > May be we can just start touching watchdogs from printing routine?
> > > > >
> > > > Hrm, console_unlock is called from vprintk_emit [0] with preemption
> > > > disabled. and it already has the logic implemented to call
> > > > cond_resched when possible [1].
> > > >
> > > > [0] https://elixir.bootlin.com/linux/latest/source/kernel/printk/printk.c#L2244
> > > > [1] https://elixir.bootlin.com/linux/latest/source/kernel/printk/printk.c#L2719
> > >
> > > So when we are having problems is when console_may_schedule == 0?
> >
> > Just following up...  Any progress on this?  The ability to suppress RCU
> > CPU stall warnings due to console slowness would likely be valuable to
> > quite a few people.
> >
> 
> My understanding is that the consensus is that the proper fix is the
> printk threads

Yes. The stalls are real. printk() could cause even livelockups or
slow down the entire system. It is good to know when it happens.


> currently under work and it wouldn't take long before
> it is ready to review.

It is hard to predict. It might be ready within half a year but
it might also take much longer time. It is not that trivial.
We need to make sure that the messages get out when the system
is in troubles and kthreads are not scheduled :-(

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 1/1] printk: suppress rcu stall warnings caused by slow console devices
  2022-01-10 11:10             ` Petr Mladek
@ 2022-01-17 11:26               ` Wander Costa
  0 siblings, 0 replies; 17+ messages in thread
From: Wander Costa @ 2022-01-17 11:26 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Paul E . McKenney, Sergey Senozhatsky, Wander Lairson Costa,
	Steven Rostedt, John Ogness, open list

On Mon, Jan 10, 2022 at 8:10 AM Petr Mladek <pmladek@suse.com> wrote:
>
> On Fri 2022-01-07 17:16:23, Wander Costa wrote:
> > On Fri, Jan 7, 2022 at 4:03 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> > >
> > > On Fri, Nov 12, 2021 at 06:57:55AM -0800, Paul E. McKenney wrote:
> > > > On Fri, Nov 12, 2021 at 11:42:39AM -0300, Wander Costa wrote:
> > > > > On Thu, Nov 11, 2021 at 10:42 PM Sergey Senozhatsky
> > > > > <senozhatsky@chromium.org> wrote:
> > > > > >
> > > > > > On (21/11/11 16:59), Wander Lairson Costa wrote:
> > > > > > >
> > > > > > > If we have a reasonable large dataset to flush in the printk ring
> > > > > > > buffer in the presence of a slow console device (like a serial port
> > > > > > > with a low baud rate configured), the RCU stall detector may report
> > > > > > > warnings.
> > > > > > >
> > > > > > > This patch suppresses RCU stall warnings while flushing the ring buffer
> > > > > > > to the console.
> > > > > > >
> > > > > > [..]
> > > > > > > +extern int rcu_cpu_stall_suppress;
> > > > > > > +
> > > > > > > +static void rcu_console_stall_suppress(void)
> > > > > > > +{
> > > > > > > +     if (!rcu_cpu_stall_suppress)
> > > > > > > +             rcu_cpu_stall_suppress = 4;
> > > > > > > +}
> > > > > > > +
> > > > > > > +static void rcu_console_stall_unsuppress(void)
> > > > > > > +{
> > > > > > > +     if (rcu_cpu_stall_suppress == 4)
> > > > > > > +             rcu_cpu_stall_suppress = 0;
> > > > > > > +}
> > > > > > > +
> > > > > > >  /**
> > > > > > >   * console_unlock - unlock the console system
> > > > > > >   *
> > > > > > > @@ -2634,6 +2648,9 @@ void console_unlock(void)
> > > > > > >        * and cleared after the "again" goto label.
> > > > > > >        */
> > > > > > >       do_cond_resched = console_may_schedule;
> > > > > > > +
> > > > > > > +     rcu_console_stall_suppress();
> > > > > > > +
> > > > > > >  again:
> > > > > > >       console_may_schedule = 0;
> > > > > > >
> > > > > > > @@ -2645,6 +2662,7 @@ void console_unlock(void)
> > > > > > >       if (!can_use_console()) {
> > > > > > >               console_locked = 0;
> > > > > > >               up_console_sem();
> > > > > > > +             rcu_console_stall_unsuppress();
> > > > > > >               return;
> > > > > > >       }
> > > > > > >
> > > > > > > @@ -2716,8 +2734,10 @@ void console_unlock(void)
> > > > > > >
> > > > > > >               handover = console_lock_spinning_disable_and_check();
> > > > > > >               printk_safe_exit_irqrestore(flags);
> > > > > > > -             if (handover)
> > > > > > > +             if (handover) {
> > > > > > > +                     rcu_console_stall_unsuppress();
> > > > > > >                       return;
> > > > > > > +             }
> > > > > > >
> > > > > > >               if (do_cond_resched)
> > > > > > >                       cond_resched();
> > > > > > > @@ -2738,6 +2758,8 @@ void console_unlock(void)
> > > > > > >       retry = prb_read_valid(prb, next_seq, NULL);
> > > > > > >       if (retry && console_trylock())
> > > > > > >               goto again;
> > > > > > > +
> > > > > > > +     rcu_console_stall_unsuppress();
> > > > > > >  }
> > > > > >
> > > > > > May be we can just start touching watchdogs from printing routine?
> > > > > >
> > > > > Hrm, console_unlock is called from vprintk_emit [0] with preemption
> > > > > disabled. and it already has the logic implemented to call
> > > > > cond_resched when possible [1].
> > > > >
> > > > > [0] https://elixir.bootlin.com/linux/latest/source/kernel/printk/printk.c#L2244
> > > > > [1] https://elixir.bootlin.com/linux/latest/source/kernel/printk/printk.c#L2719
> > > >
> > > > So when we are having problems is when console_may_schedule == 0?
> > >
> > > Just following up...  Any progress on this?  The ability to suppress RCU
> > > CPU stall warnings due to console slowness would likely be valuable to
> > > quite a few people.
> > >
> >
> > My understanding is that the consensus is that the proper fix is the
> > printk threads
>
> Yes. The stalls are real. printk() could cause even livelockups or
> slow down the entire system. It is good to know when it happens.
>
>
> > currently under work and it wouldn't take long before
> > it is ready to review.
>
> It is hard to predict. It might be ready within half a year but
> it might also take much longer time. It is not that trivial.
> We need to make sure that the messages get out when the system
> is in troubles and kthreads are not scheduled :-(
>

I am not sure what the policy is regarding temporary fixes, but since
we don't know when
printk threads will be ready for merge, there are real false positives
this patch fixes. Is it possible to land this patch in the meantime?
I understand that people might want to see these reports in some use
cases. Maybe if I add a build option to this feature?

> Best Regards,
> Petr
>


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2022-01-17 11:26 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-11 19:59 [PATCH v2 0/1] printk: suppress rcu stall warnings caused by slow Wander Lairson Costa
2021-11-11 19:59 ` [PATCH v2 1/1] printk: suppress rcu stall warnings caused by slow console devices Wander Lairson Costa
2021-11-11 21:13   ` Paul E. McKenney
2021-11-11 23:28   ` kernel test robot
2021-11-11 23:33   ` kernel test robot
2021-11-12  1:41   ` Sergey Senozhatsky
2021-11-12  2:23     ` Paul E. McKenney
2021-11-12 14:42     ` Wander Costa
2021-11-12 14:57       ` Paul E. McKenney
2021-11-12 14:58         ` Wander Costa
2022-01-07 18:53         ` Paul E. McKenney
2022-01-07 20:16           ` Wander Costa
2022-01-10 11:10             ` Petr Mladek
2022-01-17 11:26               ` Wander Costa
2021-11-12  8:45   ` Petr Mladek
2021-11-12 14:08     ` Wander Costa
2021-11-15 10:10       ` Petr Mladek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).