All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] printk: do cond_resched() between lines while outputting to consoles
@ 2015-11-24 21:31 Tejun Heo
  2015-11-25  9:05 ` Jan Kara
  0 siblings, 1 reply; 4+ messages in thread
From: Tejun Heo @ 2015-11-24 21:31 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Calvin Owens, Jan Kara, Dave Jones, Kyle McMartin, linux-kernel,
	kernel-team

@console_may_schedule tracks whether console_sem was acquired through
lock or trylock.  If former, we're inside a sleepable context and
console_conditional_schedule() performs cond_resched().  This allows
console drivers which use console_lock for synchronization to yield
while performing time-consuming operations such as scrolling.

However, the actual console outputting is performed while holding
irq-safe logbuf_lock, so console_unlock() clears @console_may_schedule
before starting outputting lines.  Also, only a few drivers call
console_conditional_schedule() to begin with.  This means that when a
lot of lines need to be output by console_unlock(), for example on a
console registration, the task doing console_unlock() may not yield
for a long time on a non-preemptible kernel.

If this happens with a slow console devices, for example a serial
console, the outputting task may occupy the cpu for a very long time.
Long enough to trigger softlockup and/or RCU stall warnings, which in
turn pile more messages, sometimes enough to trigger the next cycle of
warnings incapacitating the system.

Fix it by making console_unlock() insert cond_resched() beween lines
if @console_may_schedule.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Calvin Owens <calvinowens@fb.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Kyle McMartin <kyle@kernel.org>
Cc: stable@vger.kernel.org
---
 kernel/printk/printk.c |   16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2233,13 +2233,24 @@ void console_unlock(void)
 	static u64 seen_seq;
 	unsigned long flags;
 	bool wake_klogd = false;
-	bool retry;
+	bool do_cond_resched, retry;
 
 	if (console_suspended) {
 		up_console_sem();
 		return;
 	}
 
+	/*
+	 * Console drivers are called under logbuf_lock, so
+	 * @console_may_schedule should be cleared before; however, we may
+	 * end up dumping a lot of lines, for example, if called from
+	 * console registration path, and should invoke cond_resched()
+	 * between lines if allowable.  Not doing so can cause a very long
+	 * scheduling stall on a slow console leading to RCU stall and
+	 * softlockup warnings which exacerbate the issue with more
+	 * messages practically incapacitating the system.
+	 */
+	do_cond_resched = console_may_schedule;
 	console_may_schedule = 0;
 
 	/* flush buffered message fragment immediately to console */
@@ -2311,6 +2322,9 @@ skip:
 		call_console_drivers(level, ext_text, ext_len, text, len);
 		start_critical_timings();
 		local_irq_restore(flags);
+
+		if (do_cond_resched)
+			cond_resched();
 	}
 	console_locked = 0;
 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] printk: do cond_resched() between lines while outputting to consoles
  2015-11-24 21:31 [PATCH] printk: do cond_resched() between lines while outputting to consoles Tejun Heo
@ 2015-11-25  9:05 ` Jan Kara
  2015-11-25 17:02   ` Tejun Heo
  0 siblings, 1 reply; 4+ messages in thread
From: Jan Kara @ 2015-11-25  9:05 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Andrew Morton, Calvin Owens, Jan Kara, Dave Jones, Kyle McMartin,
	linux-kernel, kernel-team

On Tue 24-11-15 16:31:25, Tejun Heo wrote:
> @console_may_schedule tracks whether console_sem was acquired through
> lock or trylock.  If former, we're inside a sleepable context and
> console_conditional_schedule() performs cond_resched().  This allows
> console drivers which use console_lock for synchronization to yield
> while performing time-consuming operations such as scrolling.
> 
> However, the actual console outputting is performed while holding
> irq-safe logbuf_lock, so console_unlock() clears @console_may_schedule
> before starting outputting lines.  Also, only a few drivers call
> console_conditional_schedule() to begin with.  This means that when a
> lot of lines need to be output by console_unlock(), for example on a
> console registration, the task doing console_unlock() may not yield
> for a long time on a non-preemptible kernel.

So did you particularly have an issue during console registration? Because
at least our customers mostly have issues during heavy use of ordinary
printk (e.g. during boot or when hardware gets probed) and your change
doesn't affect that case. That being said if you really hit a case where
your patch helps, then I have no problem with it (you can add my Acked-by).

At Kernel Summit I spoke with Linus and Andrew regarding printk softlockups
and we ended up with a decision that we decouple queueing into kernel
ringbuffer from the actual printing into console which would happen from
kthread / workqueue. Then the lockups would be solved by printing to
console happening from schedulable context and printk() as such being
independent from console speed. We only have to have some special cases
there for crashes so that messages get printed synchronously in that case.

								Honza

> If this happens with a slow console devices, for example a serial
> console, the outputting task may occupy the cpu for a very long time.
> Long enough to trigger softlockup and/or RCU stall warnings, which in
> turn pile more messages, sometimes enough to trigger the next cycle of
> warnings incapacitating the system.
> 
> Fix it by making console_unlock() insert cond_resched() beween lines
> if @console_may_schedule.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Reported-by: Calvin Owens <calvinowens@fb.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Dave Jones <davej@codemonkey.org.uk>
> Cc: Kyle McMartin <kyle@kernel.org>
> Cc: stable@vger.kernel.org
> ---
>  kernel/printk/printk.c |   16 +++++++++++++++-
>  1 file changed, 15 insertions(+), 1 deletion(-)
> 
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -2233,13 +2233,24 @@ void console_unlock(void)
>  	static u64 seen_seq;
>  	unsigned long flags;
>  	bool wake_klogd = false;
> -	bool retry;
> +	bool do_cond_resched, retry;
>  
>  	if (console_suspended) {
>  		up_console_sem();
>  		return;
>  	}
>  
> +	/*
> +	 * Console drivers are called under logbuf_lock, so
> +	 * @console_may_schedule should be cleared before; however, we may
> +	 * end up dumping a lot of lines, for example, if called from
> +	 * console registration path, and should invoke cond_resched()
> +	 * between lines if allowable.  Not doing so can cause a very long
> +	 * scheduling stall on a slow console leading to RCU stall and
> +	 * softlockup warnings which exacerbate the issue with more
> +	 * messages practically incapacitating the system.
> +	 */
> +	do_cond_resched = console_may_schedule;
>  	console_may_schedule = 0;
>  
>  	/* flush buffered message fragment immediately to console */
> @@ -2311,6 +2322,9 @@ skip:
>  		call_console_drivers(level, ext_text, ext_len, text, len);
>  		start_critical_timings();
>  		local_irq_restore(flags);
> +
> +		if (do_cond_resched)
> +			cond_resched();
>  	}
>  	console_locked = 0;
>  
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] printk: do cond_resched() between lines while outputting to consoles
  2015-11-25  9:05 ` Jan Kara
@ 2015-11-25 17:02   ` Tejun Heo
  2015-11-26  8:53     ` Jan Kara
  0 siblings, 1 reply; 4+ messages in thread
From: Tejun Heo @ 2015-11-25 17:02 UTC (permalink / raw)
  To: Jan Kara
  Cc: Andrew Morton, Calvin Owens, Dave Jones, Kyle McMartin,
	linux-kernel, kernel-team

Hello, Jan.

On Wed, Nov 25, 2015 at 10:05:22AM +0100, Jan Kara wrote:
> So did you particularly have an issue during console registration? Because

Yeap, we're seeing a small ratio of machines falling head over hills
during IPMI serial console registration.  Pumping out the messages
collected prior to registration takes too long triggering softlockup
warning on all forty something CPUs which pile a metric ton of
messages atop.  From then on, softlockup / rcu stall warnings repeat
themselves.  Some machines recover after >10mins of doing that.  The
log is hillarious to look at afterward.

> at least our customers mostly have issues during heavy use of ordinary
> printk (e.g. during boot or when hardware gets probed) and your change
> doesn't affect that case. That being said if you really hit a case where

Hah, that must be a lot of messages being printk'd.

> your patch helps, then I have no problem with it (you can add my Acked-by).
> 
> At Kernel Summit I spoke with Linus and Andrew regarding printk softlockups
> and we ended up with a decision that we decouple queueing into kernel
> ringbuffer from the actual printing into console which would happen from
> kthread / workqueue. Then the lockups would be solved by printing to
> console happening from schedulable context and printk() as such being
> independent from console speed. We only have to have some special cases
> there for crashes so that messages get printed synchronously in that case.

Yeah, we'd prolly want to make the behavior contingent on the time
taken and so on.  At any rate, even with workqueue-deferred dumping,
this patch would still be necessary for non-preemptible kernels;
otherwise, there's no cond_resched() in printing path right now.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] printk: do cond_resched() between lines while outputting to consoles
  2015-11-25 17:02   ` Tejun Heo
@ 2015-11-26  8:53     ` Jan Kara
  0 siblings, 0 replies; 4+ messages in thread
From: Jan Kara @ 2015-11-26  8:53 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Jan Kara, Andrew Morton, Calvin Owens, Dave Jones, Kyle McMartin,
	linux-kernel, kernel-team

  Hello,

On Wed 25-11-15 12:02:17, Tejun Heo wrote:
> On Wed, Nov 25, 2015 at 10:05:22AM +0100, Jan Kara wrote:
> > So did you particularly have an issue during console registration? Because
> 
> Yeap, we're seeing a small ratio of machines falling head over hills
> during IPMI serial console registration.  Pumping out the messages
> collected prior to registration takes too long triggering softlockup
> warning on all forty something CPUs which pile a metric ton of
> messages atop.  From then on, softlockup / rcu stall warnings repeat
> themselves.  Some machines recover after >10mins of doing that.  The
> log is hillarious to look at afterward.

OK, then feel free to add my:

Acked-by: Jan Kara <jack@suse.com>

> > at least our customers mostly have issues during heavy use of ordinary
> > printk (e.g. during boot or when hardware gets probed) and your change
> > doesn't affect that case. That being said if you really hit a case where
> 
> Hah, that must be a lot of messages being printk'd.

Yes, it is. They have ~1000 SCSI devices attached (250 disks, each over 4
paths) and similar stuff. But also doing sysrq-t on a large machine
generates enough output to kill the machine...

> > your patch helps, then I have no problem with it (you can add my Acked-by).
> > 
> > At Kernel Summit I spoke with Linus and Andrew regarding printk softlockups
> > and we ended up with a decision that we decouple queueing into kernel
> > ringbuffer from the actual printing into console which would happen from
> > kthread / workqueue. Then the lockups would be solved by printing to
> > console happening from schedulable context and printk() as such being
> > independent from console speed. We only have to have some special cases
> > there for crashes so that messages get printed synchronously in that case.
> 
> Yeah, we'd prolly want to make the behavior contingent on the time
> taken and so on.  At any rate, even with workqueue-deferred dumping,
> this patch would still be necessary for non-preemptible kernels;
> otherwise, there's no cond_resched() in printing path right now.

Yup.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2015-11-26  8:53 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-24 21:31 [PATCH] printk: do cond_resched() between lines while outputting to consoles Tejun Heo
2015-11-25  9:05 ` Jan Kara
2015-11-25 17:02   ` Tejun Heo
2015-11-26  8:53     ` Jan Kara

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.