All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH 1/7] printk: Hand over printing to console if printing too long
@ 2015-12-10 14:52 Sergey Senozhatsky
  2015-12-10 15:24 ` Sergey Senozhatsky
                   ` (2 more replies)
  0 siblings, 3 replies; 28+ messages in thread
From: Sergey Senozhatsky @ 2015-12-10 14:52 UTC (permalink / raw)
  To: Jan Kara
  Cc: Andrew Morton, Petr Mladek, KY Srinivasan, Steven Rostedt,
	linux-kernel, Sergey Senozhatsky, Sergey Senozhatsky

Hello,

*** in this email and in every later emails ***
    Sorry, if I messed up with Cc list or message-ids. It's suprisingly
hard to jump in into a loop that has never been in your inbox. It took
some `googling' effort.

I haven't tested the patch set yet, I just 'ported' it to linux-next.
I reverted 073696a8bc7779b ("printk: do cond_resched() between lines while
outputting to consoles") as a first step, but it comes in later again. I can
send out the updated series (off list is OK).

> Currently, console_unlock() prints messages from kernel printk buffer to
> console while the buffer is non-empty. When serial console is attached,
> printing is slow and thus other CPUs in the system have plenty of time
> to append new messages to the buffer while one CPU is printing. Thus the
> CPU can spend unbounded amount of time doing printing in console_unlock().
> This is especially serious problem if the printk() calling
> console_unlock() was called with interrupts disabled.
>
> In practice users have observed a CPU can spend tens of seconds printing
> in console_unlock() (usually during boot when hundreds of SCSI devices
> are discovered) resulting in RCU stalls (CPU doing printing doesn't
> reach quiescent state for a long time), softlockup reports (IPIs for the
> printing CPU don't get served and thus other CPUs are spinning waiting
> for the printing CPU to process IPIs), and eventually a machine death
> (as messages from stalls and lockups append to printk buffer faster than
> we are able to print). So these machines are unable to boot with serial
> console attached. Also during artificial stress testing SATA disk
> disappears from the system because its interrupts aren't served for too
> long.
>
> This patch implements a mechanism where after printing specified number
> of characters (tunable as a kernel parameter printk.offload_chars), CPU
> doing printing asks for help by waking up one of dedicated kthreads.  As
> soon as the printing CPU notices kthread got scheduled and is spinning
> on print_lock dedicated for that purpose, it drops console_sem,
> print_lock, and exits console_unlock(). Kthread then takes over printing
> instead. This way no CPU should spend printing too long even if there
> is heavy printk traffic.
>
> Signed-off-by: Jan Kara <jack@suse.cz>

I think we better use raw_spin_lock as a print_lock; and, apart from that,
seems that we don't re-init in zap_lock(). So I ended up with the following
patch on top of yours (to be folded):

- use raw_spin_lock
- do not forget to re-init `print_lock' in zap_locks()
---
 kernel/printk/printk.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index d986599..2a86ff1 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -85,7 +85,7 @@ EXPORT_SYMBOL_GPL(console_drivers);
  * we can spin on it when some other thread wants to take over printing to
  * console.
  */
-static DEFINE_SPINLOCK(print_lock);
+static DEFINE_RAW_SPINLOCK(print_lock);
 
 /*
  * Number of printing threads spinning on print_lock. Can go away once
@@ -1516,6 +1516,7 @@ static void zap_locks(void)
 	/* If a crash is occurring, make sure we can't deadlock */
 	raw_spin_lock_init(&logbuf_lock);
 	/* And make sure that we print immediately */
+	raw_spin_lock_init(&print_lock);
 	sema_init(&console_sem, 1);
 }
 
@@ -2311,7 +2312,7 @@ void console_unlock(void)
 	console_cont_flush(text, sizeof(text));
 again:
 	retry = false;
-	spin_lock_irqsave(&print_lock, flags);
+	raw_spin_lock_irqsave(&print_lock, flags);
 	for (;;) {
 		struct printk_log *msg;
 		size_t ext_len = 0;
@@ -2410,7 +2411,7 @@ skip:
 	 * succeeds in getting console_sem (unless someone else takes it and
 	 * then he'll be responsible for printing).
          */
-	spin_unlock_irqrestore(&print_lock, flags);
+	raw_spin_unlock_irqrestore(&print_lock, flags);
 
 	/*
 	 * In case we cannot trylock the console_sem again, there's a new owner
@@ -2773,9 +2774,9 @@ static int printing_task(void *arg)
 		 * want to sleep once we got scheduled to make sure we take
 		 * over printing without depending on the scheduler.
 		 */
-		spin_lock_irqsave(&print_lock, flags);
+		raw_spin_lock_irqsave(&print_lock, flags);
 		atomic_dec(&printing_tasks_spinning);
-		spin_unlock_irqrestore(&print_lock, flags);
+		raw_spin_unlock_irqrestore(&print_lock, flags);
 		if (console_trylock())
 			console_unlock();
 		preempt_enable();
-- 
2.6.3


^ permalink raw reply related	[flat|nested] 28+ messages in thread
* [PATCH 0/6 v2] printk: Softlockup avoidance
@ 2015-10-26  4:52 Jan Kara
  2015-10-26  4:52 ` [PATCH 1/7] printk: Hand over printing to console if printing too long Jan Kara
  0 siblings, 1 reply; 28+ messages in thread
From: Jan Kara @ 2015-10-26  4:52 UTC (permalink / raw)
  To: Andrew Morton; +Cc: LKML, pmladek, KY Srinivasan, rostedt, Jan Kara

From: Jan Kara <jack@suse.cz>

Hello,

here is another posting of the revived patch set to avoid lockups during heavy
printing. Lately there were several attempts at dealing with softlockups due to
heavy printk traffic [1] [2] and I've been also privately pinged by couple of
people about the state of the patch set, so I've decided to revive the patch
set. Patches (their older version) are present in SUSE enterprise kernels for
several years and we didn't observe any issues with them.

Patch set passes my stress testing where serial console is slowed down to print
~1000 chars per second and there are 100 delayed works printing together some
64k of text and in parallel modules are inserted which generates quite some
additional messages, stop_machine() calls etc.

Changes since v1:
* printing kthreads now check for kthread_should_stop()
* printing kthreads are now bound to CPUs so that scheduler cannot decide to
  schedule both kthreads on one CPU which effectively makes it impossible to
  hand over printing between them. This happened relatively frequently in
  virtual machines.
* use printk buffer draining code in panic() to force all messages out when
  the system is dying
* better naming of logbuf flushing functions suggested by AKPM
* fixed irq safety of printing lock as pointed out by AKMP
* fixed various smaller issues pointed by AKPM


Changes since the the old patch set [3]:
* I have replaced the state machine to pass printing and spinning on
  console_sem with a simple spinlock which makes the code
  somewhat easier to read and verify.
* Some of the patches were merged so I dropped them.

To remind the original problem:

Currently, console_unlock() prints messages from kernel printk buffer to
console while the buffer is non-empty. When serial console is attached,
printing is slow and thus other CPUs in the system have plenty of time
to append new messages to the buffer while one CPU is printing. Thus the
CPU can spend unbounded amount of time doing printing in console_unlock().
This is especially serious when printk() gets called under some critical
spinlock or with interrupts disabled.
    
In practice users have observed a CPU can spend tens of seconds printing
in console_unlock() (usually during boot when hundreds of SCSI devices
are discovered) resulting in RCU stalls (CPU doing printing doesn't
reach quiescent state for a long time), softlockup reports (IPIs for the
printing CPU don't get served and thus other CPUs are spinning waiting
for the printing CPU to process IPIs), and eventually a machine death
(as messages from stalls and lockups append to printk buffer faster than
we are able to print). So these machines are unable to boot with serial
console attached. Also during artificial stress testing SATA disk
disappears from the system because its interrupts aren't served for too
long.

This series addresses the problem in the following way: If CPU has printed
more that printk_offload (defaults to 1000) characters, it wakes up one
of dedicated printk kthreads (we don't use workqueue because that has
deadlock potential if printk was called from workqueue code). Once we find
out kthread is spinning on a lock, we stop printing, drop console_sem, and
let kthread continue printing. Since there are two printing kthreads, they
will pass printing between them and thus no CPU gets hogged by printing.

								Honza

[1] https://lkml.org/lkml/2015/7/8/215
[2] http://marc.info/?l=linux-kernel&m=143929238407816&w=2
[3] https://lkml.org/lkml/2014/3/17/68

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2016-03-02  9:30 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-10 14:52 [PATCH 1/7] printk: Hand over printing to console if printing too long Sergey Senozhatsky
2015-12-10 15:24 ` Sergey Senozhatsky
2015-12-11  4:27 ` Sergey Senozhatsky
2015-12-11  6:29   ` Sergey Senozhatsky
2015-12-22 13:47 ` Jan Kara
2015-12-22 14:48   ` Sergey Senozhatsky
2015-12-23  1:54   ` Sergey Senozhatsky
2015-12-23  3:37     ` Sergey Senozhatsky
2015-12-23  3:57       ` Sergey Senozhatsky
2015-12-23  4:15         ` Sergey Senozhatsky
2016-01-05 14:37     ` Jan Kara
2016-01-06  1:41       ` Sergey Senozhatsky
2016-01-06  6:48       ` Sergey Senozhatsky
2016-01-06 12:25         ` Jan Kara
2016-01-11 13:25           ` Sergey Senozhatsky
2015-12-31  2:44   ` Sergey Senozhatsky
2015-12-31  3:13     ` Sergey Senozhatsky
2015-12-31  4:58       ` Sergey Senozhatsky
2016-01-05 14:48         ` Jan Kara
2016-01-06  3:38           ` Sergey Senozhatsky
2016-01-06  8:36             ` Sergey Senozhatsky
2016-01-06 10:21               ` Jan Kara
2016-01-06 11:10                 ` Sergey Senozhatsky
2016-01-11 12:54   ` Petr Mladek
2016-01-12 14:00     ` Jan Kara
  -- strict thread matches above, loose matches on Subject: below --
2015-10-26  4:52 [PATCH 0/6 v2] printk: Softlockup avoidance Jan Kara
2015-10-26  4:52 ` [PATCH 1/7] printk: Hand over printing to console if printing too long Jan Kara
2016-03-01 17:22   ` Denys Vlasenko
2016-03-02  9:30     ` Jan Kara

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.