All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH 1/7] printk: Hand over printing to console if printing too long
@ 2015-12-10 14:52 Sergey Senozhatsky
  2015-12-10 15:24 ` Sergey Senozhatsky
                   ` (2 more replies)
  0 siblings, 3 replies; 28+ messages in thread
From: Sergey Senozhatsky @ 2015-12-10 14:52 UTC (permalink / raw)
  To: Jan Kara
  Cc: Andrew Morton, Petr Mladek, KY Srinivasan, Steven Rostedt,
	linux-kernel, Sergey Senozhatsky, Sergey Senozhatsky

Hello,

*** in this email and in every later emails ***
    Sorry, if I messed up with Cc list or message-ids. It's suprisingly
hard to jump in into a loop that has never been in your inbox. It took
some `googling' effort.

I haven't tested the patch set yet, I just 'ported' it to linux-next.
I reverted 073696a8bc7779b ("printk: do cond_resched() between lines while
outputting to consoles") as a first step, but it comes in later again. I can
send out the updated series (off list is OK).

> Currently, console_unlock() prints messages from kernel printk buffer to
> console while the buffer is non-empty. When serial console is attached,
> printing is slow and thus other CPUs in the system have plenty of time
> to append new messages to the buffer while one CPU is printing. Thus the
> CPU can spend unbounded amount of time doing printing in console_unlock().
> This is especially serious problem if the printk() calling
> console_unlock() was called with interrupts disabled.
>
> In practice users have observed a CPU can spend tens of seconds printing
> in console_unlock() (usually during boot when hundreds of SCSI devices
> are discovered) resulting in RCU stalls (CPU doing printing doesn't
> reach quiescent state for a long time), softlockup reports (IPIs for the
> printing CPU don't get served and thus other CPUs are spinning waiting
> for the printing CPU to process IPIs), and eventually a machine death
> (as messages from stalls and lockups append to printk buffer faster than
> we are able to print). So these machines are unable to boot with serial
> console attached. Also during artificial stress testing SATA disk
> disappears from the system because its interrupts aren't served for too
> long.
>
> This patch implements a mechanism where after printing specified number
> of characters (tunable as a kernel parameter printk.offload_chars), CPU
> doing printing asks for help by waking up one of dedicated kthreads.  As
> soon as the printing CPU notices kthread got scheduled and is spinning
> on print_lock dedicated for that purpose, it drops console_sem,
> print_lock, and exits console_unlock(). Kthread then takes over printing
> instead. This way no CPU should spend printing too long even if there
> is heavy printk traffic.
>
> Signed-off-by: Jan Kara <jack@suse.cz>

I think we better use raw_spin_lock as a print_lock; and, apart from that,
seems that we don't re-init in zap_lock(). So I ended up with the following
patch on top of yours (to be folded):

- use raw_spin_lock
- do not forget to re-init `print_lock' in zap_locks()
---
 kernel/printk/printk.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index d986599..2a86ff1 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -85,7 +85,7 @@ EXPORT_SYMBOL_GPL(console_drivers);
  * we can spin on it when some other thread wants to take over printing to
  * console.
  */
-static DEFINE_SPINLOCK(print_lock);
+static DEFINE_RAW_SPINLOCK(print_lock);
 
 /*
  * Number of printing threads spinning on print_lock. Can go away once
@@ -1516,6 +1516,7 @@ static void zap_locks(void)
 	/* If a crash is occurring, make sure we can't deadlock */
 	raw_spin_lock_init(&logbuf_lock);
 	/* And make sure that we print immediately */
+	raw_spin_lock_init(&print_lock);
 	sema_init(&console_sem, 1);
 }
 
@@ -2311,7 +2312,7 @@ void console_unlock(void)
 	console_cont_flush(text, sizeof(text));
 again:
 	retry = false;
-	spin_lock_irqsave(&print_lock, flags);
+	raw_spin_lock_irqsave(&print_lock, flags);
 	for (;;) {
 		struct printk_log *msg;
 		size_t ext_len = 0;
@@ -2410,7 +2411,7 @@ skip:
 	 * succeeds in getting console_sem (unless someone else takes it and
 	 * then he'll be responsible for printing).
          */
-	spin_unlock_irqrestore(&print_lock, flags);
+	raw_spin_unlock_irqrestore(&print_lock, flags);
 
 	/*
 	 * In case we cannot trylock the console_sem again, there's a new owner
@@ -2773,9 +2774,9 @@ static int printing_task(void *arg)
 		 * want to sleep once we got scheduled to make sure we take
 		 * over printing without depending on the scheduler.
 		 */
-		spin_lock_irqsave(&print_lock, flags);
+		raw_spin_lock_irqsave(&print_lock, flags);
 		atomic_dec(&printing_tasks_spinning);
-		spin_unlock_irqrestore(&print_lock, flags);
+		raw_spin_unlock_irqrestore(&print_lock, flags);
 		if (console_trylock())
 			console_unlock();
 		preempt_enable();
-- 
2.6.3


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/7] printk: Hand over printing to console if printing too long
  2015-12-10 14:52 [PATCH 1/7] printk: Hand over printing to console if printing too long Sergey Senozhatsky
@ 2015-12-10 15:24 ` Sergey Senozhatsky
  2015-12-11  4:27 ` Sergey Senozhatsky
  2015-12-22 13:47 ` Jan Kara
  2 siblings, 0 replies; 28+ messages in thread
From: Sergey Senozhatsky @ 2015-12-10 15:24 UTC (permalink / raw)
  To: Jan Kara
  Cc: Andrew Morton, Petr Mladek, KY Srinivasan, Steven Rostedt,
	linux-kernel, Sergey Senozhatsky, Sergey Senozhatsky

On (12/10/15 23:52), Sergey Senozhatsky wrote:
> 
> I think we better use raw_spin_lock as a print_lock; and, apart from that,
> seems that we don't re-init in zap_lock(). So I ended up with the following
> patch on top of yours (to be folded):
> 
> - use raw_spin_lock
> - do not forget to re-init `print_lock' in zap_locks()

while we are on this, what do you guys think?


CPU1                                        CPU2
console_unlock()
  call_console_drivers()
    con->write()
       ... spin_lock ... uart, etc          panic

zap_lock() will raw_spin_lock_init(&logbuf_lock) and
sema_init(&console_sem, 1), but we still have `spin_lock'
held by con->write().

so the `panic' flush or print out will see con->write() being already
blocked.


===8<====

We do zap_lock() in printk to make a panic print out possible, but we
can end up having a locked serial console - e.g. panic has occurred
whilst CPUx was in con->write(), which takes some internal locks, thus
call_console_drivers() will perform con->write() on an already locked
console.

Try to reset() every console in zap_lock() via console specific
->reset() call.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
 include/linux/console.h | 1 +
 kernel/printk/printk.c  | 6 ++++++
 2 files changed, 7 insertions(+)

diff --git a/include/linux/console.h b/include/linux/console.h
index bd19434..1cb8f72 100644
--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -125,6 +125,7 @@ struct console {
 	void	(*unblank)(void);
 	int	(*setup)(struct console *, char *);
 	int	(*match)(struct console *, char *name, int idx, char *options);
+	void	(*reset)(struct console *);
 	short	flags;
 	short	index;
 	int	cflag;
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index f4a9565..ad172c4 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1580,6 +1580,7 @@ static void call_console_drivers(int level,
  */
 static void zap_locks(void)
 {
+	struct console *c;
 	static unsigned long oops_timestamp;
 
 	if (time_after_eq(jiffies, oops_timestamp) &&
@@ -1589,6 +1590,11 @@ static void zap_locks(void)
 	oops_timestamp = jiffies;
 
 	debug_locks_off();
+
+	for_each_console(c)
+		if ((c->flags & CON_ENABLED) && c->reset)
+			c->reset(c);
+
 	/* If a crash is occurring, make sure we can't deadlock */
 	raw_spin_lock_init(&logbuf_lock);
 	/* And make sure that we print immediately */
-- 
2.6.3


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/7] printk: Hand over printing to console if printing too long
  2015-12-10 14:52 [PATCH 1/7] printk: Hand over printing to console if printing too long Sergey Senozhatsky
  2015-12-10 15:24 ` Sergey Senozhatsky
@ 2015-12-11  4:27 ` Sergey Senozhatsky
  2015-12-11  6:29   ` Sergey Senozhatsky
  2015-12-22 13:47 ` Jan Kara
  2 siblings, 1 reply; 28+ messages in thread
From: Sergey Senozhatsky @ 2015-12-11  4:27 UTC (permalink / raw)
  To: Jan Kara
  Cc: Andrew Morton, Petr Mladek, KY Srinivasan, Steven Rostedt,
	linux-kernel, Sergey Senozhatsky, Sergey Senozhatsky

On (12/10/15 23:52), Sergey Senozhatsky wrote:
> > Currently, console_unlock() prints messages from kernel printk buffer to
> > console while the buffer is non-empty. When serial console is attached,
> > printing is slow and thus other CPUs in the system have plenty of time
> > to append new messages to the buffer while one CPU is printing. Thus the
> > CPU can spend unbounded amount of time doing printing in console_unlock().
> > This is especially serious problem if the printk() calling
> > console_unlock() was called with interrupts disabled.
> >
> > In practice users have observed a CPU can spend tens of seconds printing
> > in console_unlock() (usually during boot when hundreds of SCSI devices
> > are discovered) resulting in RCU stalls (CPU doing printing doesn't
> > reach quiescent state for a long time), softlockup reports (IPIs for the
> > printing CPU don't get served and thus other CPUs are spinning waiting
> > for the printing CPU to process IPIs), and eventually a machine death
> > (as messages from stalls and lockups append to printk buffer faster than
> > we are able to print). So these machines are unable to boot with serial
> > console attached. Also during artificial stress testing SATA disk
> > disappears from the system because its interrupts aren't served for too
> > long.
> >
> > This patch implements a mechanism where after printing specified number
> > of characters (tunable as a kernel parameter printk.offload_chars), CPU
> > doing printing asks for help by waking up one of dedicated kthreads.  As
> > soon as the printing CPU notices kthread got scheduled and is spinning
> > on print_lock dedicated for that purpose, it drops console_sem,
> > print_lock, and exits console_unlock(). Kthread then takes over printing
> > instead. This way no CPU should spend printing too long even if there
> > is heavy printk traffic.
> >
> > Signed-off-by: Jan Kara <jack@suse.cz>


> static bool cpu_stop_printing(int printed_chars)
> {
>         /* Oops? Print everything now to maximize chances user will see it */
>         if (oops_in_progress)
>                 return false;
>         if (!printk_offload_chars || printed_chars < printk_offload_chars)
>                 return false;
>         /*
>          * Make sure we load fresh value of printing_tasks_spinning. Matches
>          * the barrier in printing_task()
>          */
>         smp_rmb();
>         if (atomic_read(&printing_tasks_spinning))
>                 return true;
>         wake_up(&print_queue);
>
-	return false;
+	return true;
> }


we just woke up a task we will offload printing to. no need to do another round
of call_console_drivers() from current, possibly overrunning printk_offload_chars
by strlen of next msg, which can be close to max text length in the worst case,
while woken up print task will burn cpu cycles spinning on the print_lock.

	-ss

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/7] printk: Hand over printing to console if printing too long
  2015-12-11  4:27 ` Sergey Senozhatsky
@ 2015-12-11  6:29   ` Sergey Senozhatsky
  0 siblings, 0 replies; 28+ messages in thread
From: Sergey Senozhatsky @ 2015-12-11  6:29 UTC (permalink / raw)
  To: Jan Kara
  Cc: Andrew Morton, Petr Mladek, KY Srinivasan, Steven Rostedt,
	linux-kernel, Sergey Senozhatsky, Sergey Senozhatsky

On (12/11/15 13:27), Sergey Senozhatsky wrote:
[..]
> > static bool cpu_stop_printing(int printed_chars)
> > {
> >         /* Oops? Print everything now to maximize chances user will see it */
> >         if (oops_in_progress)
> >                 return false;
> >         if (!printk_offload_chars || printed_chars < printk_offload_chars)
> >                 return false;
> >         /*
> >          * Make sure we load fresh value of printing_tasks_spinning. Matches
> >          * the barrier in printing_task()
> >          */
> >         smp_rmb();
> >         if (atomic_read(&printing_tasks_spinning))
> >                 return true;
> >         wake_up(&print_queue);
> >
> -	return false;
> +	return true;
> > }

*just as an idea*, I was thinking about having two different offload
limits -- for user space processes doing console_unlock() for whatever
reason (printk in syscall or because of console_lock, etc.) and for
KTHREADS. the kernel threads can have normal offload_limit, while user
space processes can return back from syscal sooner (doing only half of
printk worload, for example). but this is probably too `custom', though
sort of make some sense.

(compile tested only)

---
 kernel/printk/printk.c | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 79915da..cff1dd1 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -330,6 +330,7 @@ static struct kernel_param_ops offload_chars_ops = {
  * 0.1s maximum latency due to printing.
  */
 static unsigned int __read_mostly printk_offload_chars = 1000;
+static unsigned int __read_mostly printk_offload_limits[2] = {500, 1000};
 
 module_param_cb(offload_chars, &offload_chars_ops, &printk_offload_chars,
 		   S_IRUGO | S_IWUSR);
@@ -343,10 +344,14 @@ MODULE_PARM_DESC(offload_chars, "offload printing to console to a different"
  */
 static bool cpu_stop_printing(int printed_chars)
 {
+	bool type = current->flags & PF_KTHREAD;
+
 	/* Oops? Print everything now to maximize chances user will see it */
 	if (oops_in_progress)
 		return false;
-	if (!printk_offload_chars || printed_chars < printk_offload_chars)
+	if (!printk_offload_chars)
+		return false;
+	if (printed_chars < printk_offload_limits[type])
 		return false;
 	/*
 	 * Make sure we load fresh value of printing_tasks_spinning. Matches
@@ -2995,6 +3000,12 @@ out_err:
 	return ret;
 }
 
+static void offload_limits_set(void)
+{
+	printk_offload_limits[0] = printk_offload_chars >> 1;
+	printk_offload_limits[1] = printk_offload_chars;
+}
+
 static int offload_chars_set(const char *val, const struct kernel_param *kp)
 {
 	int ret;
@@ -3006,6 +3017,8 @@ static int offload_chars_set(const char *val, const struct kernel_param *kp)
 		mutex_unlock(&printing_kthread_mutex);
 		return ret;
 	}
+
+	offload_limits_set();
 	ret = printk_start_offload_kthreads();
 	mutex_unlock(&printing_kthread_mutex);
 	return ret;
@@ -3014,11 +3027,13 @@ static int offload_chars_set(const char *val, const struct kernel_param *kp)
 static void printk_offload_init(void)
 {
 	mutex_lock(&printing_kthread_mutex);
+	offload_limits_set();
 	if (num_possible_cpus() <= 1) {
 		/* Offloading doesn't make sense. Disable print offloading. */
 		printk_offload_chars = 0;
-	} else
+	} else {
 		printk_start_offload_kthreads();
+	}
 	mutex_unlock(&printing_kthread_mutex);
 }
 
-- 
2.6.4


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/7] printk: Hand over printing to console if printing too long
  2015-12-10 14:52 [PATCH 1/7] printk: Hand over printing to console if printing too long Sergey Senozhatsky
  2015-12-10 15:24 ` Sergey Senozhatsky
  2015-12-11  4:27 ` Sergey Senozhatsky
@ 2015-12-22 13:47 ` Jan Kara
  2015-12-22 14:48   ` Sergey Senozhatsky
                     ` (3 more replies)
  2 siblings, 4 replies; 28+ messages in thread
From: Jan Kara @ 2015-12-22 13:47 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Andrew Morton, Petr Mladek, KY Sri nivasan, Steven Rostedt,
	linux-kernel, Sergey Senozhatsky

[-- Attachment #1: Type: text/plain, Size: 3610 bytes --]

On Thu 10-12-15 23:52:51, Sergey Senozhatsky wrote:
> Hello,
> 
> *** in this email and in every later emails ***
>     Sorry, if I messed up with Cc list or message-ids. It's suprisingly
> hard to jump in into a loop that has never been in your inbox. It took
> some `googling' effort.
> 
> I haven't tested the patch set yet, I just 'ported' it to linux-next.
> I reverted 073696a8bc7779b ("printk: do cond_resched() between lines while
> outputting to consoles") as a first step, but it comes in later again. I can
> send out the updated series (off list is OK).
> 
> > Currently, console_unlock() prints messages from kernel printk buffer to
> > console while the buffer is non-empty. When serial console is attached,
> > printing is slow and thus other CPUs in the system have plenty of time
> > to append new messages to the buffer while one CPU is printing. Thus the
> > CPU can spend unbounded amount of time doing printing in console_unlock().
> > This is especially serious problem if the printk() calling
> > console_unlock() was called with interrupts disabled.
> >
> > In practice users have observed a CPU can spend tens of seconds printing
> > in console_unlock() (usually during boot when hundreds of SCSI devices
> > are discovered) resulting in RCU stalls (CPU doing printing doesn't
> > reach quiescent state for a long time), softlockup reports (IPIs for the
> > printing CPU don't get served and thus other CPUs are spinning waiting
> > for the printing CPU to process IPIs), and eventually a machine death
> > (as messages from stalls and lockups append to printk buffer faster than
> > we are able to print). So these machines are unable to boot with serial
> > console attached. Also during artificial stress testing SATA disk
> > disappears from the system because its interrupts aren't served for too
> > long.
> >
> > This patch implements a mechanism where after printing specified number
> > of characters (tunable as a kernel parameter printk.offload_chars), CPU
> > doing printing asks for help by waking up one of dedicated kthreads.  As
> > soon as the printing CPU notices kthread got scheduled and is spinning
> > on print_lock dedicated for that purpose, it drops console_sem,
> > print_lock, and exits console_unlock(). Kthread then takes over printing
> > instead. This way no CPU should spend printing too long even if there
> > is heavy printk traffic.
> >
> > Signed-off-by: Jan Kara <jack@suse.cz>
> 
> I think we better use raw_spin_lock as a print_lock; and, apart from that,
> seems that we don't re-init in zap_lock(). So I ended up with the following
> patch on top of yours (to be folded):
> 
> - use raw_spin_lock
> - do not forget to re-init `print_lock' in zap_locks()

Thanks for looking into my patches and sorry for replying with a delay. As
I wrote in my previous email [1] even the referenced patches are not quite
enough. Over last few days I have worked on redoing the stuff as we
discussed with Linus and Andrew at Kernel Summit and I have new patches
which are working fine for me. I still want to test them on some machines
having real issues with udev during boot but so far stress-testing with
serial console slowed down to ~1000 chars/sec on other machines and VMs
looks promising.

I'm attaching them in case you want to have a look. They are on top of
Tejun's patch adding cond_resched() (which is essential). I'll officially
submit the patches once the testing is finished (but I'm not sure when I
get to the problematic HW...).

								Honza

[1] http://www.spinics.net/lists/stable/msg111535.html
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

[-- Attachment #2: 0001-printk-Make-printk-completely-async.patch --]
[-- Type: text/x-patch, Size: 8977 bytes --]

>From 2e9675abbfc0df4a24a8c760c58e8150b9a31259 Mon Sep 17 00:00:00 2001
From: Jan Kara <jack@suse.cz>
Date: Mon, 21 Dec 2015 13:10:31 +0100
Subject: [PATCH 1/2] printk: Make printk() completely async

Currently, printk() sometimes waits for message to be printed to console
and sometimes it does not (when console_sem is held by some other
process). In case printk() grabs console_sem and starts printing to
console, it prints messages from kernel printk buffer until the buffer
is empty. When serial console is attached, printing is slow and thus
other CPUs in the system have plenty of time to append new messages to
the buffer while one CPU is printing. Thus the CPU can spend unbounded
amount of time doing printing in console_unlock().  This is especially
serious problem if the printk() calling console_unlock() was called with
interrupts disabled.

In practice users have observed a CPU can spend tens of seconds printing
in console_unlock() (usually during boot when hundreds of SCSI devices
are discovered) resulting in RCU stalls (CPU doing printing doesn't
reach quiescent state for a long time), softlockup reports (IPIs for the
printing CPU don't get served and thus other CPUs are spinning waiting
for the printing CPU to process IPIs), and eventually a machine death
(as messages from stalls and lockups append to printk buffer faster than
we are able to print). So these machines are unable to boot with serial
console attached. Another observed issue is that due to slow printk,
hardware discovery is slow and udev times out before kernel manages to
discover all the attached HW. Also during artificial stress testing SATA
disk disappears from the system because its interrupts aren't served for
too long.

This patch makes printk() completely asynchronous (similar to what
printk_deferred() did until now). It appends message to the kernel
printk buffer and queues work to do the printing to console. This has
the advantage that printing always happens from a schedulable contex and
thus we don't lockup any particular CPU or even interrupts. Also it has
the advantage that printk() is fast and thus kernel booting is not
slowed down by slow serial console. Disadvantage of this method is that
in case of crash there is higher chance that important messages won't
appear in console output (we may need working scheduling to print
message to console). We somewhat mitigate this risk by switching printk
to the original method of immediate printing to console if oops is in
progress.  Also for debugging purposes we provide printk.synchronous
kernel parameter which resorts to the original printk behavior.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 Documentation/kernel-parameters.txt |  10 +++
 kernel/printk/printk.c              | 144 +++++++++++++++++++++---------------
 2 files changed, 95 insertions(+), 59 deletions(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 742f69d18fc8..4cf1bddeffc7 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -3000,6 +3000,16 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 	printk.time=	Show timing data prefixed to each printk message line
 			Format: <bool>  (1/Y/y=enable, 0/N/n=disable)
 
+	printk.synchronous=
+			By default kernel messages are printed to console
+			asynchronously (except during early boot or when oops
+			is happening). That avoids kernel stalling behind slow
+			serial console and thus avoids softlockups, interrupt
+			timeouts, or userspace timing out during heavy printing.
+			However for debugging problems, printing messages to
+			console immediately may be desirable. This option
+			enables such behavior.
+
 	processor.max_cstate=	[HW,ACPI]
 			Limit processor to maximum C-state
 			max_cstate=9 overrides any DMI blacklist limit.
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 299c2f0e7350..d455d1bd0d2c 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -283,6 +283,73 @@ static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN);
 static char *log_buf = __log_buf;
 static u32 log_buf_len = __LOG_BUF_LEN;
 
+/*
+ * When true, printing to console will happen synchronously unless someone else
+ * is already printing messages.
+ */
+static bool __read_mostly printk_sync;
+module_param_named(synchronous, printk_sync, bool, S_IRUGO | S_IWUSR);
+MODULE_PARM_DESC(synchronous, "make printing to console synchronous");
+
+#define PRINTK_PENDING_WAKEUP	0x01
+#define PRINTK_PENDING_OUTPUT	0x02
+
+static DEFINE_PER_CPU(int, printk_pending);
+
+static void printing_work_func(struct work_struct *work)
+{
+	console_lock();
+	console_unlock();
+}
+
+static DECLARE_WORK(printing_work, printing_work_func);
+
+static void wake_up_klogd_work_func(struct irq_work *irq_work)
+{
+	int pending = __this_cpu_xchg(printk_pending, 0);
+
+	/*
+	 * We just schedule regular work to do the printing from irq work. We
+	 * don't want to do printing here directly as that happens with
+	 * interrupts disabled and thus is bad for interrupt latency. We also
+	 * don't want to queue regular work from vprintk_emit() as that gets
+	 * called in various difficult contexts where schedule_work() could
+	 * deadlock.
+	 */
+	if (pending & PRINTK_PENDING_OUTPUT)
+		schedule_work(&printing_work);
+
+	if (pending & PRINTK_PENDING_WAKEUP)
+		wake_up_interruptible(&log_wait);
+}
+
+static DEFINE_PER_CPU(struct irq_work, wake_up_klogd_work) = {
+	.func = wake_up_klogd_work_func,
+	.flags = IRQ_WORK_LAZY,
+};
+
+void wake_up_klogd(void)
+{
+	preempt_disable();
+	if (waitqueue_active(&log_wait)) {
+		this_cpu_or(printk_pending, PRINTK_PENDING_WAKEUP);
+		irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
+	}
+	preempt_enable();
+}
+
+int printk_deferred(const char *fmt, ...)
+{
+	va_list args;
+	int r;
+
+	va_start(args, fmt);
+	r = vprintk_emit(0, LOGLEVEL_SCHED, NULL, 0, fmt, args);
+	va_end(args);
+
+	return r;
+}
+
 /* Return log buffer address */
 char *log_buf_addr_get(void)
 {
@@ -1668,15 +1735,14 @@ asmlinkage int vprintk_emit(int facility, int level,
 	unsigned long flags;
 	int this_cpu;
 	int printed_len = 0;
-	bool in_sched = false;
+	bool sync_print = printk_sync;
 	/* cpu currently holding logbuf_lock in this function */
 	static unsigned int logbuf_cpu = UINT_MAX;
 
 	if (level == LOGLEVEL_SCHED) {
 		level = LOGLEVEL_DEFAULT;
-		in_sched = true;
+		sync_print = false;
 	}
-
 	boot_delay_msec(level);
 	printk_delay();
 
@@ -1803,10 +1869,24 @@ asmlinkage int vprintk_emit(int facility, int level,
 	logbuf_cpu = UINT_MAX;
 	raw_spin_unlock(&logbuf_lock);
 	lockdep_on();
+	/*
+	 * By default we print message to console asynchronously so that kernel
+	 * doesn't get stalled due to slow serial console. That can lead to
+	 * softlockups, lost interrupts, or userspace timing out under heavy
+	 * printing load.
+	 *
+	 * However we resort to synchronous printing of messages during early
+	 * boot, when oops is in progress, or when synchronous printing was
+	 * explicitely requested by kernel parameter.
+	 */
+	if (keventd_up() && !oops_in_progress && !sync_print) {
+		__this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT);
+		irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
+	} else
+		sync_print = true;
 	local_irq_restore(flags);
 
-	/* If called from the scheduler, we can not call up(). */
-	if (!in_sched) {
+	if (sync_print) {
 		lockdep_off();
 		/*
 		 * Disable preemption to avoid being preempted while holding
@@ -2688,60 +2768,6 @@ late_initcall(printk_late_init);
 
 #if defined CONFIG_PRINTK
 /*
- * Delayed printk version, for scheduler-internal messages:
- */
-#define PRINTK_PENDING_WAKEUP	0x01
-#define PRINTK_PENDING_OUTPUT	0x02
-
-static DEFINE_PER_CPU(int, printk_pending);
-
-static void wake_up_klogd_work_func(struct irq_work *irq_work)
-{
-	int pending = __this_cpu_xchg(printk_pending, 0);
-
-	if (pending & PRINTK_PENDING_OUTPUT) {
-		/* If trylock fails, someone else is doing the printing */
-		if (console_trylock())
-			console_unlock();
-	}
-
-	if (pending & PRINTK_PENDING_WAKEUP)
-		wake_up_interruptible(&log_wait);
-}
-
-static DEFINE_PER_CPU(struct irq_work, wake_up_klogd_work) = {
-	.func = wake_up_klogd_work_func,
-	.flags = IRQ_WORK_LAZY,
-};
-
-void wake_up_klogd(void)
-{
-	preempt_disable();
-	if (waitqueue_active(&log_wait)) {
-		this_cpu_or(printk_pending, PRINTK_PENDING_WAKEUP);
-		irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
-	}
-	preempt_enable();
-}
-
-int printk_deferred(const char *fmt, ...)
-{
-	va_list args;
-	int r;
-
-	preempt_disable();
-	va_start(args, fmt);
-	r = vprintk_emit(0, LOGLEVEL_SCHED, NULL, 0, fmt, args);
-	va_end(args);
-
-	__this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT);
-	irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
-	preempt_enable();
-
-	return r;
-}
-
-/*
  * printk rate limiting, lifted from the networking subsystem.
  *
  * This enforces a rate limit: not more than 10 kernel messages
-- 
2.6.2


[-- Attachment #3: 0002-printk-Skip-messages-on-oops.patch --]
[-- Type: text/x-patch, Size: 2753 bytes --]

>From be116ae18f15f0d2d05ddf0b53eaac184943d312 Mon Sep 17 00:00:00 2001
From: Jan Kara <jack@suse.cz>
Date: Mon, 21 Dec 2015 14:26:13 +0100
Subject: [PATCH 2/2] printk: Skip messages on oops

When there are too many messages in the kernel printk buffer it can take
very long to print them to console (especially when using slow serial
console). This is undesirable during oops so when we encounter oops and
there are more than 100 messages to print, print just the newest 100
messages and then the oops message.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 kernel/printk/printk.c | 34 +++++++++++++++++++++++++++++++++-
 1 file changed, 33 insertions(+), 1 deletion(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index d455d1bd0d2c..fc67ab70e9c7 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -262,6 +262,9 @@ static u64 console_seq;
 static u32 console_idx;
 static enum log_flags console_prev;
 
+/* current record sequence when oops happened */
+static u64 oops_start_seq;
+
 /* the next printk record to read after the last 'clear' command */
 static u64 clear_seq;
 static u32 clear_idx;
@@ -1783,6 +1786,8 @@ asmlinkage int vprintk_emit(int facility, int level,
 					 NULL, 0, recursion_msg,
 					 strlen(recursion_msg));
 	}
+	if (oops_in_progress && !sync_print && !oops_start_seq)
+		oops_start_seq = log_next_seq;
 
 	/*
 	 * The printf needs to come first; we need the syslog
@@ -2292,6 +2297,12 @@ out:
 	raw_spin_unlock_irqrestore(&logbuf_lock, flags);
 }
 
+/*
+ * When oops happens and there are more messages to be printed in the printk
+ * buffer that this, skip some mesages and print only this many newest messages.
+ */
+#define PRINT_MSGS_BEFORE_OOPS 100
+
 /**
  * console_unlock - unlock the console system
  *
@@ -2348,7 +2359,28 @@ again:
 			seen_seq = log_next_seq;
 		}
 
-		if (console_seq < log_first_seq) {
+		/*
+		 * If oops happened and there are more than
+		 * PRINT_MSGS_BEFORE_OOPS messages pending before oops message,
+		 * skip them to make oops appear faster.
+		 */
+		if (oops_start_seq &&
+		    console_seq + PRINT_MSGS_BEFORE_OOPS < oops_start_seq) {
+			len = sprintf(text,
+				"** %u printk messages dropped due to oops ** ",
+				(unsigned)(oops_start_seq - console_seq -
+					   PRINT_MSGS_BEFORE_OOPS));
+			if (console_seq < log_first_seq) {
+				console_seq = log_first_seq;
+				console_idx = log_first_idx;
+			}
+			while (console_seq <
+			       oops_start_seq - PRINT_MSGS_BEFORE_OOPS) {
+				console_idx = log_next(console_idx);
+				console_seq++;
+			}
+			console_prev = 0;
+		} else if (console_seq < log_first_seq) {
 			len = sprintf(text, "** %u printk messages dropped ** ",
 				      (unsigned)(log_first_seq - console_seq));
 
-- 
2.6.2


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/7] printk: Hand over printing to console if printing too long
  2015-12-22 13:47 ` Jan Kara
@ 2015-12-22 14:48   ` Sergey Senozhatsky
  2015-12-23  1:54   ` Sergey Senozhatsky
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 28+ messages in thread
From: Sergey Senozhatsky @ 2015-12-22 14:48 UTC (permalink / raw)
  To: Jan Kara
  Cc: Sergey Senozhatsky, Andrew Morton, Petr Mladek, KY Sri nivasan,
	Steven Rostedt, linux-kernel, Sergey Senozhatsky

On (12/22/15 14:47), Jan Kara wrote:
[..]
> Thanks for looking into my patches and sorry for replying with a delay. As
> I wrote in my previous email [1] even the referenced patches are not quite
> enough. Over last few days I have worked on redoing the stuff as we
> discussed with Linus and Andrew at Kernel Summit and I have new patches
> which are working fine for me. I still want to test them on some machines
> having real issues with udev during boot but so far stress-testing with
> serial console slowed down to ~1000 chars/sec on other machines and VMs
> looks promising.
> 
> I'm attaching them in case you want to have a look. They are on top of
> Tejun's patch adding cond_resched() (which is essential). I'll officially
> submit the patches once the testing is finished (but I'm not sure when I
> get to the problematic HW...).
> 

Hello,

Thanks a lot! Will take a look.

	-ss

> From 2e9675abbfc0df4a24a8c760c58e8150b9a31259 Mon Sep 17 00:00:00 2001
> From: Jan Kara <jack@suse.cz>
> Date: Mon, 21 Dec 2015 13:10:31 +0100
> Subject: [PATCH 1/2] printk: Make printk() completely async
> 
> Currently, printk() sometimes waits for message to be printed to console
> and sometimes it does not (when console_sem is held by some other
> process). In case printk() grabs console_sem and starts printing to
> console, it prints messages from kernel printk buffer until the buffer
> is empty. When serial console is attached, printing is slow and thus
> other CPUs in the system have plenty of time to append new messages to
> the buffer while one CPU is printing. Thus the CPU can spend unbounded
> amount of time doing printing in console_unlock().  This is especially
> serious problem if the printk() calling console_unlock() was called with
> interrupts disabled.
> 
> In practice users have observed a CPU can spend tens of seconds printing
> in console_unlock() (usually during boot when hundreds of SCSI devices
> are discovered) resulting in RCU stalls (CPU doing printing doesn't
> reach quiescent state for a long time), softlockup reports (IPIs for the
> printing CPU don't get served and thus other CPUs are spinning waiting
> for the printing CPU to process IPIs), and eventually a machine death
> (as messages from stalls and lockups append to printk buffer faster than
> we are able to print). So these machines are unable to boot with serial
> console attached. Another observed issue is that due to slow printk,
> hardware discovery is slow and udev times out before kernel manages to
> discover all the attached HW. Also during artificial stress testing SATA
> disk disappears from the system because its interrupts aren't served for
> too long.
> 
> This patch makes printk() completely asynchronous (similar to what
> printk_deferred() did until now). It appends message to the kernel
> printk buffer and queues work to do the printing to console. This has
> the advantage that printing always happens from a schedulable contex and
> thus we don't lockup any particular CPU or even interrupts. Also it has
> the advantage that printk() is fast and thus kernel booting is not
> slowed down by slow serial console. Disadvantage of this method is that
> in case of crash there is higher chance that important messages won't
> appear in console output (we may need working scheduling to print
> message to console). We somewhat mitigate this risk by switching printk
> to the original method of immediate printing to console if oops is in
> progress.  Also for debugging purposes we provide printk.synchronous
> kernel parameter which resorts to the original printk behavior.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  Documentation/kernel-parameters.txt |  10 +++
>  kernel/printk/printk.c              | 144 +++++++++++++++++++++---------------
>  2 files changed, 95 insertions(+), 59 deletions(-)
> 
> diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
> index 742f69d18fc8..4cf1bddeffc7 100644
> --- a/Documentation/kernel-parameters.txt
> +++ b/Documentation/kernel-parameters.txt
> @@ -3000,6 +3000,16 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
>  	printk.time=	Show timing data prefixed to each printk message line
>  			Format: <bool>  (1/Y/y=enable, 0/N/n=disable)
>  
> +	printk.synchronous=
> +			By default kernel messages are printed to console
> +			asynchronously (except during early boot or when oops
> +			is happening). That avoids kernel stalling behind slow
> +			serial console and thus avoids softlockups, interrupt
> +			timeouts, or userspace timing out during heavy printing.
> +			However for debugging problems, printing messages to
> +			console immediately may be desirable. This option
> +			enables such behavior.
> +
>  	processor.max_cstate=	[HW,ACPI]
>  			Limit processor to maximum C-state
>  			max_cstate=9 overrides any DMI blacklist limit.
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 299c2f0e7350..d455d1bd0d2c 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -283,6 +283,73 @@ static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN);
>  static char *log_buf = __log_buf;
>  static u32 log_buf_len = __LOG_BUF_LEN;
>  
> +/*
> + * When true, printing to console will happen synchronously unless someone else
> + * is already printing messages.
> + */
> +static bool __read_mostly printk_sync;
> +module_param_named(synchronous, printk_sync, bool, S_IRUGO | S_IWUSR);
> +MODULE_PARM_DESC(synchronous, "make printing to console synchronous");
> +
> +#define PRINTK_PENDING_WAKEUP	0x01
> +#define PRINTK_PENDING_OUTPUT	0x02
> +
> +static DEFINE_PER_CPU(int, printk_pending);
> +
> +static void printing_work_func(struct work_struct *work)
> +{
> +	console_lock();
> +	console_unlock();
> +}
> +
> +static DECLARE_WORK(printing_work, printing_work_func);
> +
> +static void wake_up_klogd_work_func(struct irq_work *irq_work)
> +{
> +	int pending = __this_cpu_xchg(printk_pending, 0);
> +
> +	/*
> +	 * We just schedule regular work to do the printing from irq work. We
> +	 * don't want to do printing here directly as that happens with
> +	 * interrupts disabled and thus is bad for interrupt latency. We also
> +	 * don't want to queue regular work from vprintk_emit() as that gets
> +	 * called in various difficult contexts where schedule_work() could
> +	 * deadlock.
> +	 */
> +	if (pending & PRINTK_PENDING_OUTPUT)
> +		schedule_work(&printing_work);
> +
> +	if (pending & PRINTK_PENDING_WAKEUP)
> +		wake_up_interruptible(&log_wait);
> +}
> +
> +static DEFINE_PER_CPU(struct irq_work, wake_up_klogd_work) = {
> +	.func = wake_up_klogd_work_func,
> +	.flags = IRQ_WORK_LAZY,
> +};
> +
> +void wake_up_klogd(void)
> +{
> +	preempt_disable();
> +	if (waitqueue_active(&log_wait)) {
> +		this_cpu_or(printk_pending, PRINTK_PENDING_WAKEUP);
> +		irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
> +	}
> +	preempt_enable();
> +}
> +
> +int printk_deferred(const char *fmt, ...)
> +{
> +	va_list args;
> +	int r;
> +
> +	va_start(args, fmt);
> +	r = vprintk_emit(0, LOGLEVEL_SCHED, NULL, 0, fmt, args);
> +	va_end(args);
> +
> +	return r;
> +}
> +
>  /* Return log buffer address */
>  char *log_buf_addr_get(void)
>  {
> @@ -1668,15 +1735,14 @@ asmlinkage int vprintk_emit(int facility, int level,
>  	unsigned long flags;
>  	int this_cpu;
>  	int printed_len = 0;
> -	bool in_sched = false;
> +	bool sync_print = printk_sync;
>  	/* cpu currently holding logbuf_lock in this function */
>  	static unsigned int logbuf_cpu = UINT_MAX;
>  
>  	if (level == LOGLEVEL_SCHED) {
>  		level = LOGLEVEL_DEFAULT;
> -		in_sched = true;
> +		sync_print = false;
>  	}
> -
>  	boot_delay_msec(level);
>  	printk_delay();
>  
> @@ -1803,10 +1869,24 @@ asmlinkage int vprintk_emit(int facility, int level,
>  	logbuf_cpu = UINT_MAX;
>  	raw_spin_unlock(&logbuf_lock);
>  	lockdep_on();
> +	/*
> +	 * By default we print message to console asynchronously so that kernel
> +	 * doesn't get stalled due to slow serial console. That can lead to
> +	 * softlockups, lost interrupts, or userspace timing out under heavy
> +	 * printing load.
> +	 *
> +	 * However we resort to synchronous printing of messages during early
> +	 * boot, when oops is in progress, or when synchronous printing was
> +	 * explicitely requested by kernel parameter.
> +	 */
> +	if (keventd_up() && !oops_in_progress && !sync_print) {
> +		__this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT);
> +		irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
> +	} else
> +		sync_print = true;
>  	local_irq_restore(flags);
>  
> -	/* If called from the scheduler, we can not call up(). */
> -	if (!in_sched) {
> +	if (sync_print) {
>  		lockdep_off();
>  		/*
>  		 * Disable preemption to avoid being preempted while holding
> @@ -2688,60 +2768,6 @@ late_initcall(printk_late_init);
>  
>  #if defined CONFIG_PRINTK
>  /*
> - * Delayed printk version, for scheduler-internal messages:
> - */
> -#define PRINTK_PENDING_WAKEUP	0x01
> -#define PRINTK_PENDING_OUTPUT	0x02
> -
> -static DEFINE_PER_CPU(int, printk_pending);
> -
> -static void wake_up_klogd_work_func(struct irq_work *irq_work)
> -{
> -	int pending = __this_cpu_xchg(printk_pending, 0);
> -
> -	if (pending & PRINTK_PENDING_OUTPUT) {
> -		/* If trylock fails, someone else is doing the printing */
> -		if (console_trylock())
> -			console_unlock();
> -	}
> -
> -	if (pending & PRINTK_PENDING_WAKEUP)
> -		wake_up_interruptible(&log_wait);
> -}
> -
> -static DEFINE_PER_CPU(struct irq_work, wake_up_klogd_work) = {
> -	.func = wake_up_klogd_work_func,
> -	.flags = IRQ_WORK_LAZY,
> -};
> -
> -void wake_up_klogd(void)
> -{
> -	preempt_disable();
> -	if (waitqueue_active(&log_wait)) {
> -		this_cpu_or(printk_pending, PRINTK_PENDING_WAKEUP);
> -		irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
> -	}
> -	preempt_enable();
> -}
> -
> -int printk_deferred(const char *fmt, ...)
> -{
> -	va_list args;
> -	int r;
> -
> -	preempt_disable();
> -	va_start(args, fmt);
> -	r = vprintk_emit(0, LOGLEVEL_SCHED, NULL, 0, fmt, args);
> -	va_end(args);
> -
> -	__this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT);
> -	irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
> -	preempt_enable();
> -
> -	return r;
> -}
> -
> -/*
>   * printk rate limiting, lifted from the networking subsystem.
>   *
>   * This enforces a rate limit: not more than 10 kernel messages
> -- 
> 2.6.2
> 

> From be116ae18f15f0d2d05ddf0b53eaac184943d312 Mon Sep 17 00:00:00 2001
> From: Jan Kara <jack@suse.cz>
> Date: Mon, 21 Dec 2015 14:26:13 +0100
> Subject: [PATCH 2/2] printk: Skip messages on oops
> 
> When there are too many messages in the kernel printk buffer it can take
> very long to print them to console (especially when using slow serial
> console). This is undesirable during oops so when we encounter oops and
> there are more than 100 messages to print, print just the newest 100
> messages and then the oops message.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  kernel/printk/printk.c | 34 +++++++++++++++++++++++++++++++++-
>  1 file changed, 33 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index d455d1bd0d2c..fc67ab70e9c7 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -262,6 +262,9 @@ static u64 console_seq;
>  static u32 console_idx;
>  static enum log_flags console_prev;
>  
> +/* current record sequence when oops happened */
> +static u64 oops_start_seq;
> +
>  /* the next printk record to read after the last 'clear' command */
>  static u64 clear_seq;
>  static u32 clear_idx;
> @@ -1783,6 +1786,8 @@ asmlinkage int vprintk_emit(int facility, int level,
>  					 NULL, 0, recursion_msg,
>  					 strlen(recursion_msg));
>  	}
> +	if (oops_in_progress && !sync_print && !oops_start_seq)
> +		oops_start_seq = log_next_seq;
>  
>  	/*
>  	 * The printf needs to come first; we need the syslog
> @@ -2292,6 +2297,12 @@ out:
>  	raw_spin_unlock_irqrestore(&logbuf_lock, flags);
>  }
>  
> +/*
> + * When oops happens and there are more messages to be printed in the printk
> + * buffer that this, skip some mesages and print only this many newest messages.
> + */
> +#define PRINT_MSGS_BEFORE_OOPS 100
> +
>  /**
>   * console_unlock - unlock the console system
>   *
> @@ -2348,7 +2359,28 @@ again:
>  			seen_seq = log_next_seq;
>  		}
>  
> -		if (console_seq < log_first_seq) {
> +		/*
> +		 * If oops happened and there are more than
> +		 * PRINT_MSGS_BEFORE_OOPS messages pending before oops message,
> +		 * skip them to make oops appear faster.
> +		 */
> +		if (oops_start_seq &&
> +		    console_seq + PRINT_MSGS_BEFORE_OOPS < oops_start_seq) {
> +			len = sprintf(text,
> +				"** %u printk messages dropped due to oops ** ",
> +				(unsigned)(oops_start_seq - console_seq -
> +					   PRINT_MSGS_BEFORE_OOPS));
> +			if (console_seq < log_first_seq) {
> +				console_seq = log_first_seq;
> +				console_idx = log_first_idx;
> +			}
> +			while (console_seq <
> +			       oops_start_seq - PRINT_MSGS_BEFORE_OOPS) {
> +				console_idx = log_next(console_idx);
> +				console_seq++;
> +			}
> +			console_prev = 0;
> +		} else if (console_seq < log_first_seq) {
>  			len = sprintf(text, "** %u printk messages dropped ** ",
>  				      (unsigned)(log_first_seq - console_seq));
>  
> -- 
> 2.6.2
> 


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/7] printk: Hand over printing to console if printing too long
  2015-12-22 13:47 ` Jan Kara
  2015-12-22 14:48   ` Sergey Senozhatsky
@ 2015-12-23  1:54   ` Sergey Senozhatsky
  2015-12-23  3:37     ` Sergey Senozhatsky
  2016-01-05 14:37     ` Jan Kara
  2015-12-31  2:44   ` Sergey Senozhatsky
  2016-01-11 12:54   ` Petr Mladek
  3 siblings, 2 replies; 28+ messages in thread
From: Sergey Senozhatsky @ 2015-12-23  1:54 UTC (permalink / raw)
  To: Jan Kara
  Cc: Sergey Senozhatsky, Andrew Morton, Petr Mladek, KY Sri nivasan,
	Steven Rostedt, linux-kernel, Sergey Senozhatsky

Hi,

slowly looking through the patches.

On (12/22/15 14:47), Jan Kara wrote:
[..]
> @@ -1803,10 +1869,24 @@ asmlinkage int vprintk_emit(int facility, int level,
>  	logbuf_cpu = UINT_MAX;
>  	raw_spin_unlock(&logbuf_lock);
>  	lockdep_on();
> +	/*
> +	 * By default we print message to console asynchronously so that kernel
> +	 * doesn't get stalled due to slow serial console. That can lead to
> +	 * softlockups, lost interrupts, or userspace timing out under heavy
> +	 * printing load.
> +	 *
> +	 * However we resort to synchronous printing of messages during early
> +	 * boot, when oops is in progress, or when synchronous printing was
> +	 * explicitely requested by kernel parameter.
> +	 */
> +	if (keventd_up() && !oops_in_progress && !sync_print) {
> +		__this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT);
> +		irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
> +	} else
> +		sync_print = true;
>  	local_irq_restore(flags);

can we replace this oops_in_progress check with something more reliable?

CPU0                                CPU1 - CPUN
panic()
 local_irq_disable()                executing foo() with irqs disabled,
 console_verbose()                  or processing an extremely long irq handler.
 bust_spinlocks()
    oops_in_progress++

 smp_send_stop()

 bust_spinlocks()
    oops_in_progress--              ok, IPI arrives
                                    dump_stack()/printk()/etc from IPI_CPU_STOP
			            "while (1) cpu_relax()" with irq/fiq disabled/halt/etc.

smp_send_stop() wrapped in `oops_in_progress++/oops_in_progress--' is arch specific,
and some platforms don't do any IPI-delivered (e.g. via num_online_cpus()) checks at
all. Some do. For example, arm/arm64:

void smp_send_stop(void)
...
        /* Wait up to one second for other CPUs to stop */
        timeout = USEC_PER_SEC;
        while (num_online_cpus() > 1 && timeout--)
                udelay(1);

        if (num_online_cpus() > 1)
                pr_warn("SMP: failed to stop secondary CPUs\n");
...


so there are non-zero chances that IPI will arrive to CPU after 'oops_in_progress--',
and thus dump_stack()/etc. happening on that/those cpu/cpus will be lost.


bust_spinlocks(0) does
...
	if (--oops_in_progress == 0)
		wake_up_klogd();
...

but local cpu has irqs disabled and `panic_timeout' can be zero.

How about setting 'sync_print' to 'true' in...
  bust_spinlocks() /* only set to true */
or
  console_verbose() /* um... may be... */
or
  having a separate one-liner for that

void console_panic_mode(void)
{
	sync_print = true;
}

and call it early in panic(), before we send out IPI_STOP.

	-ss

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/7] printk: Hand over printing to console if printing too long
  2015-12-23  1:54   ` Sergey Senozhatsky
@ 2015-12-23  3:37     ` Sergey Senozhatsky
  2015-12-23  3:57       ` Sergey Senozhatsky
  2016-01-05 14:37     ` Jan Kara
  1 sibling, 1 reply; 28+ messages in thread
From: Sergey Senozhatsky @ 2015-12-23  3:37 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Jan Kara, Sergey Senozhatsky, Andrew Morton, Petr Mladek,
	KY Sri nivasan, Steven Rostedt, linux-kernel

On (12/23/15 10:54), Sergey Senozhatsky wrote:
> On (12/22/15 14:47), Jan Kara wrote:
> [..]
> > @@ -1803,10 +1869,24 @@ asmlinkage int vprintk_emit(int facility, int level,
> >  	logbuf_cpu = UINT_MAX;
> >  	raw_spin_unlock(&logbuf_lock);
> >  	lockdep_on();
> > +	/*
> > +	 * By default we print message to console asynchronously so that kernel
> > +	 * doesn't get stalled due to slow serial console. That can lead to
> > +	 * softlockups, lost interrupts, or userspace timing out under heavy
> > +	 * printing load.
> > +	 *
> > +	 * However we resort to synchronous printing of messages during early
> > +	 * boot, when oops is in progress, or when synchronous printing was
> > +	 * explicitely requested by kernel parameter.
> > +	 */
> > +	if (keventd_up() && !oops_in_progress && !sync_print) {
> > +		__this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT);
> > +		irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
> > +	} else
> > +		sync_print = true;

oops, didn't have enough coffee... missed that `else sync_print = true' :(

	-ss

> >  	local_irq_restore(flags);
> 
> can we replace this oops_in_progress check with something more reliable?
> 
> CPU0                                CPU1 - CPUN
> panic()
>  local_irq_disable()                executing foo() with irqs disabled,
>  console_verbose()                  or processing an extremely long irq handler.
>  bust_spinlocks()
>     oops_in_progress++
> 
>  smp_send_stop()
> 
>  bust_spinlocks()
>     oops_in_progress--              ok, IPI arrives
>                                     dump_stack()/printk()/etc from IPI_CPU_STOP
> 			            "while (1) cpu_relax()" with irq/fiq disabled/halt/etc.
> 
> smp_send_stop() wrapped in `oops_in_progress++/oops_in_progress--' is arch specific,
> and some platforms don't do any IPI-delivered (e.g. via num_online_cpus()) checks at
> all. Some do. For example, arm/arm64:
> 
> void smp_send_stop(void)
> ...
>         /* Wait up to one second for other CPUs to stop */
>         timeout = USEC_PER_SEC;
>         while (num_online_cpus() > 1 && timeout--)
>                 udelay(1);
> 
>         if (num_online_cpus() > 1)
>                 pr_warn("SMP: failed to stop secondary CPUs\n");
> ...
> 
> 
> so there are non-zero chances that IPI will arrive to CPU after 'oops_in_progress--',
> and thus dump_stack()/etc. happening on that/those cpu/cpus will be lost.
> 
> 
> bust_spinlocks(0) does
> ...
> 	if (--oops_in_progress == 0)
> 		wake_up_klogd();
> ...
> 
> but local cpu has irqs disabled and `panic_timeout' can be zero.
> 
> How about setting 'sync_print' to 'true' in...
>   bust_spinlocks() /* only set to true */
> or
>   console_verbose() /* um... may be... */
> or
>   having a separate one-liner for that
> 
> void console_panic_mode(void)
> {
> 	sync_print = true;
> }
> 
> and call it early in panic(), before we send out IPI_STOP.
> 
> 	-ss
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/7] printk: Hand over printing to console if printing too long
  2015-12-23  3:37     ` Sergey Senozhatsky
@ 2015-12-23  3:57       ` Sergey Senozhatsky
  2015-12-23  4:15         ` Sergey Senozhatsky
  0 siblings, 1 reply; 28+ messages in thread
From: Sergey Senozhatsky @ 2015-12-23  3:57 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Jan Kara, Sergey Senozhatsky, Andrew Morton, Petr Mladek,
	KY Sri nivasan, Steven Rostedt, linux-kernel

On (12/23/15 12:37), Sergey Senozhatsky wrote:
> Date: Wed, 23 Dec 2015 12:37:24 +0900
> From: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
> To: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
> Cc: Jan Kara <jack@suse.cz>, Sergey Senozhatsky
>  <sergey.senozhatsky@gmail.com>, Andrew Morton <akpm@linux-foundation.org>,
>  Petr Mladek <pmladek@suse.cz>, KY Sri nivasan <kys@microsoft.com>, Steven
>  Rostedt <rostedt@goodmis.org>, linux-kernel@vger.kernel.org
> Subject: Re: [PATCH 1/7] printk: Hand over printing to console if printing
>  too long
> User-Agent: Mutt/1.5.24 (2015-08-30)
> 
> On (12/23/15 10:54), Sergey Senozhatsky wrote:
> > On (12/22/15 14:47), Jan Kara wrote:
> > [..]
> > > @@ -1803,10 +1869,24 @@ asmlinkage int vprintk_emit(int facility, int level,
> > >  	logbuf_cpu = UINT_MAX;
> > >  	raw_spin_unlock(&logbuf_lock);
> > >  	lockdep_on();
> > > +	/*
> > > +	 * By default we print message to console asynchronously so that kernel
> > > +	 * doesn't get stalled due to slow serial console. That can lead to
> > > +	 * softlockups, lost interrupts, or userspace timing out under heavy
> > > +	 * printing load.
> > > +	 *
> > > +	 * However we resort to synchronous printing of messages during early
> > > +	 * boot, when oops is in progress, or when synchronous printing was
> > > +	 * explicitely requested by kernel parameter.
> > > +	 */
> > > +	if (keventd_up() && !oops_in_progress && !sync_print) {
> > > +		__this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT);
> > > +		irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
> > > +	} else
> > > +		sync_print = true;
> 
> oops, didn't have enough coffee... missed that `else sync_print = true' :(
> 

ah, never mind my previous email... it's a local variable, so the very next printk()
happening right after bust_spinlocks(0) will irq_work_queue(). I'd prefer CPUs to
print stacks rather than burn cpu cycles in `while (1) cpu_relax()' loop.

so

else {
	printk_sync = true;
	sync_print = true; /* and remove this local variable entirely may be*/
}

> > can we replace this oops_in_progress check with something more reliable?
> > 
> > CPU0                                CPU1 - CPUN
> > panic()
> >  local_irq_disable()                executing foo() with irqs disabled,
> >  console_verbose()                  or processing an extremely long irq handler.
> >  bust_spinlocks()
> >     oops_in_progress++

					or we huge enough number of CPUs, `deep' stack
					traces, slow serial and CPU doing dump_stack()
					under raw_spin_lock(&stop_lock), so it can take
					longer than 1 second to print the stacks and
					thus panic CPU will set oops_in_progress back
					to 0.

> >  smp_send_stop()
> > 
> >  bust_spinlocks()
> >     oops_in_progress--              ok, IPI arrives
> >                                     dump_stack()/printk()/etc from IPI_CPU_STOP
> > 			            "while (1) cpu_relax()" with irq/fiq disabled/halt/etc.
> > 
> > smp_send_stop() wrapped in `oops_in_progress++/oops_in_progress--' is arch specific,
> > and some platforms don't do any IPI-delivered (e.g. via num_online_cpus()) checks at
> > all. Some do. For example, arm/arm64:
> > 
> > void smp_send_stop(void)
> > ...
> >         /* Wait up to one second for other CPUs to stop */
> >         timeout = USEC_PER_SEC;
> >         while (num_online_cpus() > 1 && timeout--)
> >                 udelay(1);
> > 
> >         if (num_online_cpus() > 1)
> >                 pr_warn("SMP: failed to stop secondary CPUs\n");
> > ...
> > 
> > 
> > so there are non-zero chances that IPI will arrive to CPU after 'oops_in_progress--',
> > and thus dump_stack()/etc. happening on that/those cpu/cpus will be lost.
> > 
> > 
> > bust_spinlocks(0) does
> > ...
> > 	if (--oops_in_progress == 0)
> > 		wake_up_klogd();
> > ...
> > 
> > but local cpu has irqs disabled and `panic_timeout' can be zero.
> > 
> > How about setting 'sync_print' to 'true' in...
> >   bust_spinlocks() /* only set to true */
> > or
> >   console_verbose() /* um... may be... */
> > or
> >   having a separate one-liner for that
> > 
> > void console_panic_mode(void)
> > {
> > 	sync_print = true;

	printk_sync = true;

> > }
> > 
> > and call it early in panic(), before we send out IPI_STOP.


	-ss

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/7] printk: Hand over printing to console if printing too long
  2015-12-23  3:57       ` Sergey Senozhatsky
@ 2015-12-23  4:15         ` Sergey Senozhatsky
  0 siblings, 0 replies; 28+ messages in thread
From: Sergey Senozhatsky @ 2015-12-23  4:15 UTC (permalink / raw)
  To: Jan Kara
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Andrew Morton,
	Petr Mladek, KY Sri nivasan, Steven Rostedt, linux-kernel

On (12/23/15 12:57), Sergey Senozhatsky wrote:
[..]
> > > can we replace this oops_in_progress check with something more reliable?
> > > 
> > > CPU0                                CPU1 - CPUN
> > > panic()
> > >  local_irq_disable()                executing foo() with irqs disabled,
> > >  console_verbose()                  or processing an extremely long irq handler.
> > >  bust_spinlocks()
> > >     oops_in_progress++
> 
> 					or we huge enough number of CPUs, `deep' stack
> 					traces, slow serial and CPU doing dump_stack()
> 					under raw_spin_lock(&stop_lock), so it can take
> 					longer than 1 second to print the stacks and
> 					thus panic CPU will set oops_in_progress back
> 					to 0.
> 
> > >  smp_send_stop()
> > > 
> > >  bust_spinlocks()
> > >     oops_in_progress--              ok, IPI arrives
> > >                                     dump_stack()/printk()/etc from IPI_CPU_STOP
> > > 			            "while (1) cpu_relax()" with irq/fiq disabled/halt/etc.
> > > 
> > > smp_send_stop() wrapped in `oops_in_progress++/oops_in_progress--' is arch specific,
> > > and some platforms don't do any IPI-delivered (e.g. via num_online_cpus()) checks at
> > > all. Some do. For example, arm/arm64:
> > > 
> > > void smp_send_stop(void)
> > > ...
> > >         /* Wait up to one second for other CPUs to stop */
> > >         timeout = USEC_PER_SEC;
> > >         while (num_online_cpus() > 1 && timeout--)
> > >                 udelay(1);
> > > 
> > >         if (num_online_cpus() > 1)
> > >                 pr_warn("SMP: failed to stop secondary CPUs\n");
> > > ...
> > > 
> > > 
> > > so there are non-zero chances that IPI will arrive to CPU after 'oops_in_progress--',
> > > and thus dump_stack()/etc. happening on that/those cpu/cpus will be lost.
> > > 
> > > 
> > > bust_spinlocks(0) does
> > > ...
> > > 	if (--oops_in_progress == 0)
> > > 		wake_up_klogd();
> > > ...
> > > 
> > > but local cpu has irqs disabled and `panic_timeout' can be zero.

well, if panic_timeout != 0, then wake_up_klogd() calls irq_work_queue() which
schedule_work. what if we have the following

CPU0						CPU1 - CPUN

foo
preempt_disable
	bar
		panic				irq/fiq disable
			schedule_work		while (1) cpu_relax

	-ss

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/7] printk: Hand over printing to console if printing too long
  2015-12-22 13:47 ` Jan Kara
  2015-12-22 14:48   ` Sergey Senozhatsky
  2015-12-23  1:54   ` Sergey Senozhatsky
@ 2015-12-31  2:44   ` Sergey Senozhatsky
  2015-12-31  3:13     ` Sergey Senozhatsky
  2016-01-11 12:54   ` Petr Mladek
  3 siblings, 1 reply; 28+ messages in thread
From: Sergey Senozhatsky @ 2015-12-31  2:44 UTC (permalink / raw)
  To: Jan Kara
  Cc: Sergey Senozhatsky, Andrew Morton, Petr Mladek, KY Sri nivasan,
	Steven Rostedt, linux-kernel, Sergey Senozhatsky

Hello,

On (12/22/15 14:47), Jan Kara wrote:
[..]
> +int printk_deferred(const char *fmt, ...)
> +{
> +	va_list args;
> +	int r;
> +
> +	va_start(args, fmt);
> +	r = vprintk_emit(0, LOGLEVEL_SCHED, NULL, 0, fmt, args);
> +	va_end(args);
> +
> +	return r;
> +}
[..]
> @@ -1803,10 +1869,24 @@ asmlinkage int vprintk_emit(int facility, int level,
>  	logbuf_cpu = UINT_MAX;
>  	raw_spin_unlock(&logbuf_lock);
>  	lockdep_on();
> +	/*
> +	 * By default we print message to console asynchronously so that kernel
> +	 * doesn't get stalled due to slow serial console. That can lead to
> +	 * softlockups, lost interrupts, or userspace timing out under heavy
> +	 * printing load.
> +	 *
> +	 * However we resort to synchronous printing of messages during early
> +	 * boot, when oops is in progress, or when synchronous printing was
> +	 * explicitely requested by kernel parameter.
> +	 */
> +	if (keventd_up() && !oops_in_progress && !sync_print) {
> +		__this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT);
> +		irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
> +	} else
> +		sync_print = true;
>  	local_irq_restore(flags);

So this fixes printk() and printk_deferred(), but it doesn't address any of the
direct and indirect console_lock/console_unlock callers.

for example, direct:
~/_mmots$ git grep console_unlock | egrep -v "printk\.c|panic\.c|console\.h" | wc -l
199

indirect (e.g. via console_devices()):
~/_mmots$ git grep console_device | egrep -v "printk\.c|panic\.c|console\.h|_console_device" | wc -l
4

One of those indirect callers is tty_lookup_driver(), called from tty_open(). Which
is quite big to ignore, I suspect.

A user space process opening a tty can end up doing that while (1) call_console_drivers()
loop, I suspect. At least nothing prevents it, at a glance.

A side note, isn't it too often to cond_resched() from console_unlock()? What if
we have 10000000 very short printk() messages (e.g. no more than 32 chars).

	-ss

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/7] printk: Hand over printing to console if printing too long
  2015-12-31  2:44   ` Sergey Senozhatsky
@ 2015-12-31  3:13     ` Sergey Senozhatsky
  2015-12-31  4:58       ` Sergey Senozhatsky
  0 siblings, 1 reply; 28+ messages in thread
From: Sergey Senozhatsky @ 2015-12-31  3:13 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Jan Kara, Sergey Senozhatsky, Andrew Morton, Petr Mladek,
	KY Sri nivasan, Steven Rostedt, linux-kernel

On (12/31/15 11:44), Sergey Senozhatsky wrote:
> On (12/22/15 14:47), Jan Kara wrote:
> [..]
> > +int printk_deferred(const char *fmt, ...)
> > +{
> > +	va_list args;
> > +	int r;
> > +
> > +	va_start(args, fmt);
> > +	r = vprintk_emit(0, LOGLEVEL_SCHED, NULL, 0, fmt, args);
> > +	va_end(args);
> > +
> > +	return r;
> > +}
> [..]
> > @@ -1803,10 +1869,24 @@ asmlinkage int vprintk_emit(int facility, int level,
> >  	logbuf_cpu = UINT_MAX;
> >  	raw_spin_unlock(&logbuf_lock);
> >  	lockdep_on();
> > +	/*
> > +	 * By default we print message to console asynchronously so that kernel
> > +	 * doesn't get stalled due to slow serial console. That can lead to
> > +	 * softlockups, lost interrupts, or userspace timing out under heavy
> > +	 * printing load.
> > +	 *
> > +	 * However we resort to synchronous printing of messages during early
> > +	 * boot, when oops is in progress, or when synchronous printing was
> > +	 * explicitely requested by kernel parameter.
> > +	 */
> > +	if (keventd_up() && !oops_in_progress && !sync_print) {
> > +		__this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT);
> > +		irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
> > +	} else
> > +		sync_print = true;
> >  	local_irq_restore(flags);
> 
> So this fixes printk() and printk_deferred(), but it doesn't address any of the
> direct and indirect console_lock/console_unlock callers.
> 
> for example, direct:
> ~/_mmots$ git grep console_unlock | egrep -v "printk\.c|panic\.c|console\.h" | wc -l
> 199
> 
> indirect (e.g. via console_devices()):
> ~/_mmots$ git grep console_device | egrep -v "printk\.c|panic\.c|console\.h|_console_device" | wc -l
> 4
> 
> One of those indirect callers is tty_lookup_driver(), called from tty_open(). Which
> is quite big to ignore, I suspect.
>
> 
> A user space process opening a tty can end up doing that while (1) call_console_drivers()
> loop, I suspect. At least nothing prevents it, at a glance.

d'oh... sorry. that cold that I have is affecting me... no more emails for today.


cond_resched() does its job there, of course. well, a user process still can
do a lot of call_console_drivers() calls. may be we can check who is calling
console_unlock() and if we have "!printk_sync && !oops_in_progress" (or just printk_sync
test) AND a user process then return from console_unlock() doing irq_work_queue()
and set PRINTK_PENDING_OUTPUT pending bit, the way vprintk_emit() does it.

	-ss

> A side note, isn't it too often to cond_resched() from console_unlock()? What if
> we have 10000000 very short printk() messages (e.g. no more than 32 chars).

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/7] printk: Hand over printing to console if printing too long
  2015-12-31  3:13     ` Sergey Senozhatsky
@ 2015-12-31  4:58       ` Sergey Senozhatsky
  2016-01-05 14:48         ` Jan Kara
  0 siblings, 1 reply; 28+ messages in thread
From: Sergey Senozhatsky @ 2015-12-31  4:58 UTC (permalink / raw)
  To: Jan Kara
  Cc: Andrew Morton, Petr Mladek, KY Sri nivasan, Steven Rostedt,
	linux-kernel, Sergey Senozhatsky, Sergey Senozhatsky

[-- Attachment #1: Type: text/plain, Size: 1378 bytes --]

On (12/31/15 12:13), Sergey Senozhatsky wrote:
[..]
> cond_resched() does its job there, of course. well, a user process still can
> do a lot of call_console_drivers() calls. may be we can check who is calling
> console_unlock() and if we have "!printk_sync && !oops_in_progress" (or just printk_sync
> test) AND a user process then return from console_unlock() doing irq_work_queue()
> and set PRINTK_PENDING_OUTPUT pending bit, the way vprintk_emit() does it.

attached two patches, I ended up having on top of yours. just in case.

    printk: factor out can_printk_async()
    
    console_unlock() can be called directly or indirectly by a user
    space process, so it can end up doing call_console_drivers() loop,
    which will hold it from returning back to user-space from a syscall
    for unpredictable amount of time.
    
    Factor out can_printk_async() function, which queues an irq work and
    sets a PRINTK_PENDING_OUTPUT pending bit (if we can do async printk).
    vprintk_emit() already does it, add can_printk_async() call to
    console_unlock() for !PF_KTHREAD processes.

and

    printk: introduce console_sync_mode
    
    console_sync_mode() should be called early in panic() to switch
    printk from async mode to sync. Otherwise, STOP IPIs can arrive
    to other CPUs too late and those CPUs will see oops_in_progress
    being 0 again.

	-ss

[-- Attachment #2: 0001-printk-introduce-console_sync_mode.patch --]
[-- Type: text/x-diff, Size: 1791 bytes --]

>From c3fc955809adab8f497cdc7581e67e1fa29d6517 Mon Sep 17 00:00:00 2001
From: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Date: Wed, 30 Dec 2015 20:39:12 +0900
Subject: [PATCH 1/2] printk: introduce console_sync_mode

console_sync_mode() should be called early in panic() to switch
printk from async mode to sync. Otherwise, STOP IPIs can arrive
to other CPUs too late and those CPUs will see oops_in_progress
being 0 again.
---
 include/linux/console.h | 1 +
 kernel/panic.c          | 1 +
 kernel/printk/printk.c  | 5 +++++
 3 files changed, 7 insertions(+)

diff --git a/include/linux/console.h b/include/linux/console.h
index bd19434..f068985 100644
--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -150,6 +150,7 @@ extern int console_trylock(void);
 extern void console_unlock(void);
 extern void console_conditional_schedule(void);
 extern void console_unblank(void);
+extern void console_sync_mode(void);
 extern struct tty_driver *console_device(int *);
 extern void console_stop(struct console *);
 extern void console_start(struct console *);
diff --git a/kernel/panic.c b/kernel/panic.c
index b333380..04c8ff4 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -117,6 +117,7 @@ void panic(const char *fmt, ...)
 	if (old_cpu != PANIC_CPU_INVALID && old_cpu != this_cpu)
 		panic_smp_self_stop();
 
+	console_sync_mode();
 	console_verbose();
 	bust_spinlocks(1);
 	va_start(args, fmt);
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index de9d31b..47a70a2 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2501,6 +2501,11 @@ void console_unblank(void)
 	console_unlock();
 }
 
+void console_sync_mode(void)
+{
+	printk_sync = true;
+}
+
 /*
  * Return the console tty driver structure and its associated index
  */
-- 
2.6.4


[-- Attachment #3: 0002-printk-factor-out-can_printk_async.patch --]
[-- Type: text/x-diff, Size: 3019 bytes --]

>From 92f2c0f2a5ed015caa2757dcfec4407d708f8628 Mon Sep 17 00:00:00 2001
From: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Date: Thu, 31 Dec 2015 13:39:58 +0900
Subject: [PATCH 2/2] printk: factor out can_printk_async()

console_unlock() can be called directly or indirectly by a user
space process, so it can end up doing call_console_drivers() loop,
which will hold it from returning back to user-space from a syscall
for unpredictable amount of time.

Factor out can_printk_async() function, which queues an irq work and
sets a PRINTK_PENDING_OUTPUT pending bit (if we can do async printk).
vprintk_emit() already does it, add can_printk_async() call to
console_unlock() for !PF_KTHREAD processes.
---
 kernel/printk/printk.c | 42 ++++++++++++++++++++++++++++--------------
 1 file changed, 28 insertions(+), 14 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 47a70a2..7d3a8e1 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -355,6 +355,26 @@ int printk_deferred(const char *fmt, ...)
 	return r;
 }
 
+static bool can_printk_async(bool sync)
+{
+	/*
+	 * By default we print message to console asynchronously so that kernel
+	 * doesn't get stalled due to slow serial console. That can lead to
+	 * softlockups, lost interrupts, or userspace timing out under heavy
+	 * printing load.
+	 *
+	 * However we resort to synchronous printing of messages during early
+	 * boot, when oops is in progress, or when synchronous printing was
+	 * explicitely requested by kernel parameter.
+	 */
+	if (keventd_up() && !oops_in_progress && !sync) {
+		__this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT);
+		irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
+		return true;
+	}
+	return false;
+}
+
 /* Return log buffer address */
 char *log_buf_addr_get(void)
 {
@@ -1889,20 +1909,7 @@ asmlinkage int vprintk_emit(int facility, int level,
 	logbuf_cpu = UINT_MAX;
 	raw_spin_unlock(&logbuf_lock);
 	lockdep_on();
-	/*
-	 * By default we print message to console asynchronously so that kernel
-	 * doesn't get stalled due to slow serial console. That can lead to
-	 * softlockups, lost interrupts, or userspace timing out under heavy
-	 * printing load.
-	 *
-	 * However we resort to synchronous printing of messages during early
-	 * boot, when oops is in progress, or when synchronous printing was
-	 * explicitely requested by kernel parameter.
-	 */
-	if (keventd_up() && !oops_in_progress && !sync_print) {
-		__this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT);
-		irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
-	} else
+	if (!can_printk_async(sync_print))
 		sync_print = true;
 	local_irq_restore(flags);
 
@@ -2328,6 +2335,13 @@ void console_unlock(void)
 		return;
 	}
 
+	if (!(current->flags & PF_KTHREAD) &&
+			can_printk_async(printk_sync)) {
+		console_locked = 0;
+		up_console_sem();
+		return;
+	}
+
 	/*
 	 * Console drivers are called under logbuf_lock, so
 	 * @console_may_schedule should be cleared before; however, we may
-- 
2.6.4


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/7] printk: Hand over printing to console if printing too long
  2015-12-23  1:54   ` Sergey Senozhatsky
  2015-12-23  3:37     ` Sergey Senozhatsky
@ 2016-01-05 14:37     ` Jan Kara
  2016-01-06  1:41       ` Sergey Senozhatsky
  2016-01-06  6:48       ` Sergey Senozhatsky
  1 sibling, 2 replies; 28+ messages in thread
From: Jan Kara @ 2016-01-05 14:37 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Jan Kara, Sergey Senozhatsky, Andrew Morton, Petr Mladek,
	KY Sri nivasan, Steven Rostedt, linux-kernel

Hi,

On Wed 23-12-15 10:54:49, Sergey Senozhatsky wrote:
> slowly looking through the patches.

Back from Christmas vacation...

> How about setting 'sync_print' to 'true' in...
>   bust_spinlocks() /* only set to true */
> or
>   console_verbose() /* um... may be... */
> or
>   having a separate one-liner for that
> 
> void console_panic_mode(void)
> {
> 	sync_print = true;
> }
> 
> and call it early in panic(), before we send out IPI_STOP.

I like using console_verbose() for setting sync_print to true. That will
likely be more reliable than using oops in progress. After all
console_verbose() is used like console_panic_mode() anyway and in quite a
few places so it is a reasonable match.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/7] printk: Hand over printing to console if printing too long
  2015-12-31  4:58       ` Sergey Senozhatsky
@ 2016-01-05 14:48         ` Jan Kara
  2016-01-06  3:38           ` Sergey Senozhatsky
  0 siblings, 1 reply; 28+ messages in thread
From: Jan Kara @ 2016-01-05 14:48 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Jan Kara, Andrew Morton, Petr Mladek, KY Sri nivasan,
	Steven Rostedt, linux-kernel, Sergey Senozhatsky

On Thu 31-12-15 13:58:59, Sergey Senozhatsky wrote:
> On (12/31/15 12:13), Sergey Senozhatsky wrote:
> [..]
> > cond_resched() does its job there, of course. well, a user process still can
> > do a lot of call_console_drivers() calls. may be we can check who is calling
> > console_unlock() and if we have "!printk_sync && !oops_in_progress" (or just printk_sync
> > test) AND a user process then return from console_unlock() doing irq_work_queue()
> > and set PRINTK_PENDING_OUTPUT pending bit, the way vprintk_emit() does it.
> 
> attached two patches, I ended up having on top of yours. just in case.
> 
>     printk: factor out can_printk_async()
>     
>     console_unlock() can be called directly or indirectly by a user
>     space process, so it can end up doing call_console_drivers() loop,
>     which will hold it from returning back to user-space from a syscall
>     for unpredictable amount of time.
>     
>     Factor out can_printk_async() function, which queues an irq work and
>     sets a PRINTK_PENDING_OUTPUT pending bit (if we can do async printk).
>     vprintk_emit() already does it, add can_printk_async() call to
>     console_unlock() for !PF_KTHREAD processes.

I'd be cautious about changing this userspace visible behavior. Someone may
be relying on it... I agree that sometimes we can block userspace process
in kernel for a long time (e.g. in my testing I often see syslog process
doing the printing) but so far I didn't see / was notified about some real
problem with this. So unless I see some real user issues with user
processes doing printing for too long I would not touch this.
 
> and
> 
>     printk: introduce console_sync_mode
>     
>     console_sync_mode() should be called early in panic() to switch
>     printk from async mode to sync. Otherwise, STOP IPIs can arrive
>     to other CPUs too late and those CPUs will see oops_in_progress
>     being 0 again.

So as I wrote, I like this in principle but there are much more places
calling console_verbose() and all of them want console_sync_mode() as well.
So I prefer hiding the sync printing in console_verbose() and possibly
renaming it to something better but I'm not sure renaming is worth it.

								Honza

> 
> 	-ss

> From c3fc955809adab8f497cdc7581e67e1fa29d6517 Mon Sep 17 00:00:00 2001
> From: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> Date: Wed, 30 Dec 2015 20:39:12 +0900
> Subject: [PATCH 1/2] printk: introduce console_sync_mode
> 
> console_sync_mode() should be called early in panic() to switch
> printk from async mode to sync. Otherwise, STOP IPIs can arrive
> to other CPUs too late and those CPUs will see oops_in_progress
> being 0 again.
> ---
>  include/linux/console.h | 1 +
>  kernel/panic.c          | 1 +
>  kernel/printk/printk.c  | 5 +++++
>  3 files changed, 7 insertions(+)
> 
> diff --git a/include/linux/console.h b/include/linux/console.h
> index bd19434..f068985 100644
> --- a/include/linux/console.h
> +++ b/include/linux/console.h
> @@ -150,6 +150,7 @@ extern int console_trylock(void);
>  extern void console_unlock(void);
>  extern void console_conditional_schedule(void);
>  extern void console_unblank(void);
> +extern void console_sync_mode(void);
>  extern struct tty_driver *console_device(int *);
>  extern void console_stop(struct console *);
>  extern void console_start(struct console *);
> diff --git a/kernel/panic.c b/kernel/panic.c
> index b333380..04c8ff4 100644
> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -117,6 +117,7 @@ void panic(const char *fmt, ...)
>  	if (old_cpu != PANIC_CPU_INVALID && old_cpu != this_cpu)
>  		panic_smp_self_stop();
>  
> +	console_sync_mode();
>  	console_verbose();
>  	bust_spinlocks(1);
>  	va_start(args, fmt);
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index de9d31b..47a70a2 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -2501,6 +2501,11 @@ void console_unblank(void)
>  	console_unlock();
>  }
>  
> +void console_sync_mode(void)
> +{
> +	printk_sync = true;
> +}
> +
>  /*
>   * Return the console tty driver structure and its associated index
>   */
> -- 
> 2.6.4
> 

> From 92f2c0f2a5ed015caa2757dcfec4407d708f8628 Mon Sep 17 00:00:00 2001
> From: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> Date: Thu, 31 Dec 2015 13:39:58 +0900
> Subject: [PATCH 2/2] printk: factor out can_printk_async()
> 
> console_unlock() can be called directly or indirectly by a user
> space process, so it can end up doing call_console_drivers() loop,
> which will hold it from returning back to user-space from a syscall
> for unpredictable amount of time.
> 
> Factor out can_printk_async() function, which queues an irq work and
> sets a PRINTK_PENDING_OUTPUT pending bit (if we can do async printk).
> vprintk_emit() already does it, add can_printk_async() call to
> console_unlock() for !PF_KTHREAD processes.
> ---
>  kernel/printk/printk.c | 42 ++++++++++++++++++++++++++++--------------
>  1 file changed, 28 insertions(+), 14 deletions(-)
> 
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 47a70a2..7d3a8e1 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -355,6 +355,26 @@ int printk_deferred(const char *fmt, ...)
>  	return r;
>  }
>  
> +static bool can_printk_async(bool sync)
> +{
> +	/*
> +	 * By default we print message to console asynchronously so that kernel
> +	 * doesn't get stalled due to slow serial console. That can lead to
> +	 * softlockups, lost interrupts, or userspace timing out under heavy
> +	 * printing load.
> +	 *
> +	 * However we resort to synchronous printing of messages during early
> +	 * boot, when oops is in progress, or when synchronous printing was
> +	 * explicitely requested by kernel parameter.
> +	 */
> +	if (keventd_up() && !oops_in_progress && !sync) {
> +		__this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT);
> +		irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
> +		return true;
> +	}
> +	return false;
> +}
> +
>  /* Return log buffer address */
>  char *log_buf_addr_get(void)
>  {
> @@ -1889,20 +1909,7 @@ asmlinkage int vprintk_emit(int facility, int level,
>  	logbuf_cpu = UINT_MAX;
>  	raw_spin_unlock(&logbuf_lock);
>  	lockdep_on();
> -	/*
> -	 * By default we print message to console asynchronously so that kernel
> -	 * doesn't get stalled due to slow serial console. That can lead to
> -	 * softlockups, lost interrupts, or userspace timing out under heavy
> -	 * printing load.
> -	 *
> -	 * However we resort to synchronous printing of messages during early
> -	 * boot, when oops is in progress, or when synchronous printing was
> -	 * explicitely requested by kernel parameter.
> -	 */
> -	if (keventd_up() && !oops_in_progress && !sync_print) {
> -		__this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT);
> -		irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
> -	} else
> +	if (!can_printk_async(sync_print))
>  		sync_print = true;
>  	local_irq_restore(flags);
>  
> @@ -2328,6 +2335,13 @@ void console_unlock(void)
>  		return;
>  	}
>  
> +	if (!(current->flags & PF_KTHREAD) &&
> +			can_printk_async(printk_sync)) {
> +		console_locked = 0;
> +		up_console_sem();
> +		return;
> +	}
> +
>  	/*
>  	 * Console drivers are called under logbuf_lock, so
>  	 * @console_may_schedule should be cleared before; however, we may
> -- 
> 2.6.4
> 

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/7] printk: Hand over printing to console if printing too long
  2016-01-05 14:37     ` Jan Kara
@ 2016-01-06  1:41       ` Sergey Senozhatsky
  2016-01-06  6:48       ` Sergey Senozhatsky
  1 sibling, 0 replies; 28+ messages in thread
From: Sergey Senozhatsky @ 2016-01-06  1:41 UTC (permalink / raw)
  To: Jan Kara
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Andrew Morton,
	Petr Mladek, KY Sri nivasan, Steven Rostedt, linux-kernel

Hello,

On (01/05/16 15:37), Jan Kara wrote:
> Hi,
> 
> On Wed 23-12-15 10:54:49, Sergey Senozhatsky wrote:
> > slowly looking through the patches.
> 
> Back from Christmas vacation...
> 
> > How about setting 'sync_print' to 'true' in...
> >   bust_spinlocks() /* only set to true */
> > or
> >   console_verbose() /* um... may be... */
> > or
> >   having a separate one-liner for that
> > 
> > void console_panic_mode(void)
> > {
> > 	sync_print = true;
> > }
> > 
> > and call it early in panic(), before we send out IPI_STOP.
> 
> I like using console_verbose() for setting sync_print to true. That will
> likely be more reliable than using oops in progress. After all
> console_verbose() is used like console_panic_mode() anyway and in quite a
> few places so it is a reasonable match.

Agree, only arch/microblaze/kernel/setup.c and arch/nios2/kernel/setup.c
do console_verbose() early in setup_arch(), the rest seems to be what I
was thinking of.

	-ss

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/7] printk: Hand over printing to console if printing too long
  2016-01-05 14:48         ` Jan Kara
@ 2016-01-06  3:38           ` Sergey Senozhatsky
  2016-01-06  8:36             ` Sergey Senozhatsky
  0 siblings, 1 reply; 28+ messages in thread
From: Sergey Senozhatsky @ 2016-01-06  3:38 UTC (permalink / raw)
  To: Jan Kara
  Cc: Sergey Senozhatsky, Andrew Morton, Petr Mladek, KY Sri nivasan,
	Steven Rostedt, linux-kernel, Sergey Senozhatsky

Hello,

On (01/05/16 15:48), Jan Kara wrote:
> > [..]
> > > cond_resched() does its job there, of course. well, a user process still can
> > > do a lot of call_console_drivers() calls. may be we can check who is calling
> > > console_unlock() and if we have "!printk_sync && !oops_in_progress" (or just printk_sync
> > > test) AND a user process then return from console_unlock() doing irq_work_queue()
> > > and set PRINTK_PENDING_OUTPUT pending bit, the way vprintk_emit() does it.
> > 
> > attached two patches, I ended up having on top of yours. just in case.
> > 
> >     printk: factor out can_printk_async()
> >     
> >     console_unlock() can be called directly or indirectly by a user
> >     space process, so it can end up doing call_console_drivers() loop,
> >     which will hold it from returning back to user-space from a syscall
> >     for unpredictable amount of time.
> >     
> >     Factor out can_printk_async() function, which queues an irq work and
> >     sets a PRINTK_PENDING_OUTPUT pending bit (if we can do async printk).
> >     vprintk_emit() already does it, add can_printk_async() call to
> >     console_unlock() for !PF_KTHREAD processes.
> 
> I'd be cautious about changing this userspace visible behavior. Someone may
> be relying on it... I agree that sometimes we can block userspace process
> in kernel for a long time (e.g. in my testing I often see syslog process
> doing the printing) but so far I didn't see / was notified about some real
> problem with this. So unless I see some real user issues with user
> processes doing printing for too long I would not touch this.

hm, interesting point.

/* random thoughts, I'm still on sick leave */

do we really have a user visible behaviour here that it really wants to have?
a task that does tty_open, for instance, hardly wants to end up doing a bunch
of call_console_drivers() calls in console_unlock(). it does look to me mostly
as unexpected side effect.

I probably can imagine someone writing a /usr/bin/flush_logs_to_serial
app that specifically depends on that behaviour, but that will require a bit
of hackery and trickery, since (seems) there is no syscall this app can call
that will perform *only* the required action:

	void force_flush_logs_to_serial(void)
	{
		console_lock();
		console_unlock();
	}

returning back to user space from syscalls quicker is a good thing, I'd
prefer user space apps to do less kernel job (by kernel job I mean
call_console_drivers() loop). well, at least on admittedly weird setups
that I have to deal with. but I may be missing something here.

some numbers

I added global `unsigned long k_ts, u_ts;' to accumulate time spent
in console_unlock() by PF_KTHREAD and !PF_KTHREAD correspondingly.

void console_unlock(void)
{
	...
	s_ts = local_clock();
	console_cont_flush(text, sizeof(text));
again:

	...

	if (wake_klogd)
		wake_up_klogd();

	e_ts = local_clock();
	if (time_after(e_ts, s_ts)) {
		if (current->flags & PF_KTHREAD)
			k_ts += (e_ts - s_ts);
		else
			u_ts += (e_ts - s_ts);
	}
}

and a procfs file to read the values

         unsigned long k = k_ts;
         unsigned long u = u_ts;
         unsigned long rem_nsec_k = do_div(k, 1000000000);
         unsigned long rem_nsec_u = do_div(u, 1000000000);

         return sprintf(buf, "kern:[%lu.%06lu] user:[%lu.%06lu]\n",
                         k, rem_nsec_k / 1000,
                         u, rem_nsec_u / 1000);

and w/o a lot of effort (no heavy printk message traffic)

$ cat /proc/1/time_in_console_unlock
kern:[4.241475] user:[4.077787]

that's user space spent almost the same amount of time to print kernel
messages as the kernel did on its own. which is hard to formulate as an
issue, it's just user space was doing for 4 seconds something it was not
really meant to do (at least from user space app developer's point of
view); so there is an unpredictable additional cost X added to some of
the syscalls.

	-ss

> >     printk: introduce console_sync_mode
> >     
> >     console_sync_mode() should be called early in panic() to switch
> >     printk from async mode to sync. Otherwise, STOP IPIs can arrive
> >     to other CPUs too late and those CPUs will see oops_in_progress
> >     being 0 again.
> 
> So as I wrote, I like this in principle but there are much more places
> calling console_verbose() and all of them want console_sync_mode() as well.
> So I prefer hiding the sync printing in console_verbose() and possibly
> renaming it to something better but I'm not sure renaming is worth it.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/7] printk: Hand over printing to console if printing too long
  2016-01-05 14:37     ` Jan Kara
  2016-01-06  1:41       ` Sergey Senozhatsky
@ 2016-01-06  6:48       ` Sergey Senozhatsky
  2016-01-06 12:25         ` Jan Kara
  1 sibling, 1 reply; 28+ messages in thread
From: Sergey Senozhatsky @ 2016-01-06  6:48 UTC (permalink / raw)
  To: Jan Kara
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Andrew Morton,
	Petr Mladek, KY Sri nivasan, Steven Rostedt, linux-kernel

On (01/05/16 15:37), Jan Kara wrote:
> > How about setting 'sync_print' to 'true' in...
> >   bust_spinlocks() /* only set to true */
> > or
> >   console_verbose() /* um... may be... */
> > or
> >   having a separate one-liner for that
> > 
> > void console_panic_mode(void)
> > {
> > 	sync_print = true;
> > }
> > 
> > and call it early in panic(), before we send out IPI_STOP.
> 
> I like using console_verbose() for setting sync_print to true. That will
> likely be more reliable than using oops in progress. After all
> console_verbose() is used like console_panic_mode() anyway and in quite a
> few places so it is a reasonable match.

another corner case.

a quote from -mm a74b6533ead8 http://www.spinics.net/lists/linux-mm/msg98990.html

:    This patch reduces the probability of such a lockup by introducing a
:    specialized kernel thread (oom_reaper) which tries to reclaim additional
:    memory by preemptively reaping the anonymous or swapped out memory owned
:    by the oom victim under an assumption that such a memory won't be needed
:    when its owner is killed and kicked from the userspace anyway.  There is
:    one notable exception to this, though, if the OOM victim was in the
:    process of coredumping the result would be incomplete.  This is considered
:    a reasonable constrain because the overall system health is more important
:    than debugability of a particular application.
:
:    A kernel thread has been chosen because we need a reliable way of
:    invocation so workqueue context is not appropriate because all the workers
:    might be busy (e.g.  allocating memory).  Kswapd which sounds like another
:    good fit is not appropriate as well because it might get blocked on locks
:    during reclaim as well.

particularly this "workqueue context is not appropriate because all the workers
might be busy (e.g.  allocating memory)" part. I think printk should switch to
sync mode in this case, since printk now does queue_work(system_wq, work).
um... console_verbose() call from oom kill? but it'll be nice to return back
to async mode once (if) memory pressure goes away.

	-ss

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/7] printk: Hand over printing to console if printing too long
  2016-01-06  3:38           ` Sergey Senozhatsky
@ 2016-01-06  8:36             ` Sergey Senozhatsky
  2016-01-06 10:21               ` Jan Kara
  0 siblings, 1 reply; 28+ messages in thread
From: Sergey Senozhatsky @ 2016-01-06  8:36 UTC (permalink / raw)
  To: Jan Kara
  Cc: Andrew Morton, Petr Mladek, KY Sri nivasan, Steven Rostedt,
	linux-kernel, Sergey Senozhatsky, Sergey Senozhatsky

On (01/06/16 12:38), Sergey Senozhatsky wrote:
> On (01/05/16 15:48), Jan Kara wrote:
> > > [..]
> > > > cond_resched() does its job there, of course. well, a user process still can
> > > > do a lot of call_console_drivers() calls. may be we can check who is calling
> > > > console_unlock() and if we have "!printk_sync && !oops_in_progress" (or just printk_sync
> > > > test) AND a user process then return from console_unlock() doing irq_work_queue()
> > > > and set PRINTK_PENDING_OUTPUT pending bit, the way vprintk_emit() does it.
> > > 
> > > attached two patches, I ended up having on top of yours. just in case.
> > > 
> > >     printk: factor out can_printk_async()
> > >     
> > >     console_unlock() can be called directly or indirectly by a user
> > >     space process, so it can end up doing call_console_drivers() loop,
> > >     which will hold it from returning back to user-space from a syscall
> > >     for unpredictable amount of time.
> > >     
> > >     Factor out can_printk_async() function, which queues an irq work and
> > >     sets a PRINTK_PENDING_OUTPUT pending bit (if we can do async printk).
> > >     vprintk_emit() already does it, add can_printk_async() call to
> > >     console_unlock() for !PF_KTHREAD processes.
> > 
> > I'd be cautious about changing this userspace visible behavior. Someone may
> > be relying on it... I agree that sometimes we can block userspace process
> > in kernel for a long time (e.g. in my testing I often see syslog process
> > doing the printing) but so far I didn't see / was notified about some real
> > problem with this. So unless I see some real user issues with user
> > processes doing printing for too long I would not touch this.
> 
> and w/o a lot of effort (no heavy printk message traffic)

or like this on another setup ([k|u]_ts updated to u64)

# cat /proc/1/time_in_console_unlock
kern:[12.755920] user:[38.367332]

	-ss

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/7] printk: Hand over printing to console if printing too long
  2016-01-06  8:36             ` Sergey Senozhatsky
@ 2016-01-06 10:21               ` Jan Kara
  2016-01-06 11:10                 ` Sergey Senozhatsky
  0 siblings, 1 reply; 28+ messages in thread
From: Jan Kara @ 2016-01-06 10:21 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Jan Kara, Andrew Morton, Petr Mladek, KY Sri nivasan,
	Steven Rostedt, linux-kernel, Sergey Senozhatsky

On Wed 06-01-16 17:36:53, Sergey Senozhatsky wrote:
> On (01/06/16 12:38), Sergey Senozhatsky wrote:
> > On (01/05/16 15:48), Jan Kara wrote:
> > > > [..]
> > > > > cond_resched() does its job there, of course. well, a user process still can
> > > > > do a lot of call_console_drivers() calls. may be we can check who is calling
> > > > > console_unlock() and if we have "!printk_sync && !oops_in_progress" (or just printk_sync
> > > > > test) AND a user process then return from console_unlock() doing irq_work_queue()
> > > > > and set PRINTK_PENDING_OUTPUT pending bit, the way vprintk_emit() does it.
> > > > 
> > > > attached two patches, I ended up having on top of yours. just in case.
> > > > 
> > > >     printk: factor out can_printk_async()
> > > >     
> > > >     console_unlock() can be called directly or indirectly by a user
> > > >     space process, so it can end up doing call_console_drivers() loop,
> > > >     which will hold it from returning back to user-space from a syscall
> > > >     for unpredictable amount of time.
> > > >     
> > > >     Factor out can_printk_async() function, which queues an irq work and
> > > >     sets a PRINTK_PENDING_OUTPUT pending bit (if we can do async printk).
> > > >     vprintk_emit() already does it, add can_printk_async() call to
> > > >     console_unlock() for !PF_KTHREAD processes.
> > > 
> > > I'd be cautious about changing this userspace visible behavior. Someone may
> > > be relying on it... I agree that sometimes we can block userspace process
> > > in kernel for a long time (e.g. in my testing I often see syslog process
> > > doing the printing) but so far I didn't see / was notified about some real
> > > problem with this. So unless I see some real user issues with user
> > > processes doing printing for too long I would not touch this.
> > 
> > and w/o a lot of effort (no heavy printk message traffic)
> 
> or like this on another setup ([k|u]_ts updated to u64)
> 
> # cat /proc/1/time_in_console_unlock
> kern:[12.755920] user:[38.367332]

So maybe that is worth addressing if it bothers you but please as a
separate patch set. This seems fairly independent and I think even current
version of the patches will be controversial enough...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/7] printk: Hand over printing to console if printing too long
  2016-01-06 10:21               ` Jan Kara
@ 2016-01-06 11:10                 ` Sergey Senozhatsky
  0 siblings, 0 replies; 28+ messages in thread
From: Sergey Senozhatsky @ 2016-01-06 11:10 UTC (permalink / raw)
  To: Jan Kara
  Cc: Sergey Senozhatsky, Andrew Morton, Petr Mladek, KY Sri nivasan,
	Steven Rostedt, linux-kernel, Sergey Senozhatsky

On (01/06/16 11:21), Jan Kara wrote:
[..]
> > 
> > or like this on another setup ([k|u]_ts updated to u64)
> > 
> > # cat /proc/1/time_in_console_unlock
> > kern:[12.755920] user:[38.367332]
> 
> So maybe that is worth addressing if it bothers you but please as a
> separate patch set. This seems fairly independent and I think even current
> version of the patches will be controversial enough...

Agree

	-ss

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/7] printk: Hand over printing to console if printing too long
  2016-01-06  6:48       ` Sergey Senozhatsky
@ 2016-01-06 12:25         ` Jan Kara
  2016-01-11 13:25           ` Sergey Senozhatsky
  0 siblings, 1 reply; 28+ messages in thread
From: Jan Kara @ 2016-01-06 12:25 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Jan Kara, Sergey Senozhatsky, Andrew Morton, Petr Mladek,
	KY Sri nivasan, Steven Rostedt, linux-kernel

On Wed 06-01-16 15:48:36, Sergey Senozhatsky wrote:
> On (01/05/16 15:37), Jan Kara wrote:
> > > How about setting 'sync_print' to 'true' in...
> > >   bust_spinlocks() /* only set to true */
> > > or
> > >   console_verbose() /* um... may be... */
> > > or
> > >   having a separate one-liner for that
> > > 
> > > void console_panic_mode(void)
> > > {
> > > 	sync_print = true;
> > > }
> > > 
> > > and call it early in panic(), before we send out IPI_STOP.
> > 
> > I like using console_verbose() for setting sync_print to true. That will
> > likely be more reliable than using oops in progress. After all
> > console_verbose() is used like console_panic_mode() anyway and in quite a
> > few places so it is a reasonable match.
> 
> another corner case.
> 
> a quote from -mm a74b6533ead8 http://www.spinics.net/lists/linux-mm/msg98990.html
> 
> :    This patch reduces the probability of such a lockup by introducing a
> :    specialized kernel thread (oom_reaper) which tries to reclaim additional
> :    memory by preemptively reaping the anonymous or swapped out memory owned
> :    by the oom victim under an assumption that such a memory won't be needed
> :    when its owner is killed and kicked from the userspace anyway.  There is
> :    one notable exception to this, though, if the OOM victim was in the
> :    process of coredumping the result would be incomplete.  This is considered
> :    a reasonable constrain because the overall system health is more important
> :    than debugability of a particular application.
> :
> :    A kernel thread has been chosen because we need a reliable way of
> :    invocation so workqueue context is not appropriate because all the workers
> :    might be busy (e.g.  allocating memory).  Kswapd which sounds like another
> :    good fit is not appropriate as well because it might get blocked on locks
> :    during reclaim as well.
> 
> particularly this "workqueue context is not appropriate because all the workers
> might be busy (e.g.  allocating memory)" part. I think printk should switch to
> sync mode in this case, since printk now does queue_work(system_wq, work).
> um... console_verbose() call from oom kill? but it'll be nice to return back
> to async mode once (if) memory pressure goes away.

Hum, yes, some mechanism to switch to sync printing in case work cannot be
executed for a long time is probably needed. I'll think about it.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/7] printk: Hand over printing to console if printing too long
  2015-12-22 13:47 ` Jan Kara
                     ` (2 preceding siblings ...)
  2015-12-31  2:44   ` Sergey Senozhatsky
@ 2016-01-11 12:54   ` Petr Mladek
  2016-01-12 14:00     ` Jan Kara
  3 siblings, 1 reply; 28+ messages in thread
From: Petr Mladek @ 2016-01-11 12:54 UTC (permalink / raw)
  To: Jan Kara
  Cc: Sergey Senozhatsky, Andrew Morton, KY Sri nivasan,
	Steven Rostedt, linux-kernel, Sergey Senozhatsky

On Tue 2015-12-22 14:47:30, Jan Kara wrote:
> On Thu 10-12-15 23:52:51, Sergey Senozhatsky wrote:
> > *** in this email and in every later emails ***
> Over last few days I have worked on redoing the stuff as we
> discussed with Linus and Andrew at Kernel Summit and I have new patches
> which are working fine for me. I still want to test them on some machines
> having real issues with udev during boot but so far stress-testing with
> serial console slowed down to ~1000 chars/sec on other machines and VMs
> looks promising.
> 
> I'm attaching them in case you want to have a look. They are on top of
> Tejun's patch adding cond_resched() (which is essential). I'll officially
> submit the patches once the testing is finished (but I'm not sure when I
> get to the problematic HW...).
> 
> [1] http://www.spinics.net/lists/stable/msg111535.html
> -- 
> Jan Kara <jack@suse.com>
> SUSE Labs, CR

> >From 2e9675abbfc0df4a24a8c760c58e8150b9a31259 Mon Sep 17 00:00:00 2001
> From: Jan Kara <jack@suse.cz>
> Date: Mon, 21 Dec 2015 13:10:31 +0100
> Subject: [PATCH 1/2] printk: Make printk() completely async
> 
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -1803,10 +1869,24 @@ asmlinkage int vprintk_emit(int facility, int level,
>  	logbuf_cpu = UINT_MAX;
>  	raw_spin_unlock(&logbuf_lock);
>  	lockdep_on();
> +	/*
> +	 * By default we print message to console asynchronously so that kernel
> +	 * doesn't get stalled due to slow serial console. That can lead to
> +	 * softlockups, lost interrupts, or userspace timing out under heavy
> +	 * printing load.
> +	 *
> +	 * However we resort to synchronous printing of messages during early
> +	 * boot, when oops is in progress, or when synchronous printing was
> +	 * explicitely requested by kernel parameter.
> +	 */
> +	if (keventd_up() && !oops_in_progress && !sync_print) {
> +		__this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT);
> +		irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));

I wonder if it is safe to use sync_print for early messages
from the scheduler. Well, in this case, we might need to print
the messages directly from the irq context because the system
workqueue is not ready yet :-(


> +	} else
> +		sync_print = true;
>  	local_irq_restore(flags);
>  
> -	/* If called from the scheduler, we can not call up(). */
> -	if (!in_sched) {
> +	if (sync_print) {
>  		lockdep_off();
>  		/*
>  		 * Disable preemption to avoid being preempted while holding

> >From be116ae18f15f0d2d05ddf0b53eaac184943d312 Mon Sep 17 00:00:00 2001
> From: Jan Kara <jack@suse.cz>
> Date: Mon, 21 Dec 2015 14:26:13 +0100
> Subject: [PATCH 2/2] printk: Skip messages on oops
> 
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index d455d1bd0d2c..fc67ab70e9c7 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -262,6 +262,9 @@ static u64 console_seq;
>  static u32 console_idx;
>  static enum log_flags console_prev;
>  
> +/* current record sequence when oops happened */
> +static u64 oops_start_seq;
> +
>  /* the next printk record to read after the last 'clear' command */
>  static u64 clear_seq;
>  static u32 clear_idx;
> @@ -1783,6 +1786,8 @@ asmlinkage int vprintk_emit(int facility, int level,
>  					 NULL, 0, recursion_msg,
>  					 strlen(recursion_msg));
>  	}
> +	if (oops_in_progress && !sync_print && !oops_start_seq)
> +		oops_start_seq = log_next_seq;

sync_print is false for scheduler messages here. Therefore we might
skip messages even when user set printk.synchronous on the
command line.

Otherwise, the patch set looks rather straightforward.

Best Regards,
Petr

>  	/*
>  	 * The printf needs to come first; we need the syslog
> @@ -2292,6 +2297,12 @@ out:
>  	raw_spin_unlock_irqrestore(&logbuf_lock, flags);
>  }
>  
> +/*
> + * When oops happens and there are more messages to be printed in the printk
> + * buffer that this, skip some mesages and print only this many newest messages.
> + */
> +#define PRINT_MSGS_BEFORE_OOPS 100
> +
>  /**
>   * console_unlock - unlock the console system
>   *

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/7] printk: Hand over printing to console if printing too long
  2016-01-06 12:25         ` Jan Kara
@ 2016-01-11 13:25           ` Sergey Senozhatsky
  0 siblings, 0 replies; 28+ messages in thread
From: Sergey Senozhatsky @ 2016-01-11 13:25 UTC (permalink / raw)
  To: Jan Kara
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Andrew Morton,
	Petr Mladek, KY Sri nivasan, Steven Rostedt, linux-kernel

Hello Jan,

On (01/06/16 13:25), Jan Kara wrote:
[..]
> > a quote from -mm a74b6533ead8 http://www.spinics.net/lists/linux-mm/msg98990.html
[..]
> > particularly this "workqueue context is not appropriate because all the workers
> > might be busy (e.g.  allocating memory)" part. I think printk should switch to
> > sync mode in this case, since printk now does queue_work(system_wq, work).
> > um... console_verbose() call from oom kill? but it'll be nice to return back
> > to async mode once (if) memory pressure goes away.
> 
> Hum, yes, some mechanism to switch to sync printing in case work cannot be
> executed for a long time is probably needed. I'll think about it.

well, technically, worker_pool keeps ->watchdog_ts updated, so ,basically,
worker pool knows when it stall. with CONFIG_WQ_WATCHDOG enabled timer_fn
wq_watchdog_timer_fn() checks that value and pr_emerg(). in the worst case,
printk can depend on CONFIG_WQ_WATCHDOG (yes, this sounds a bit sad) --
which implies, however, potentially long print from timer_fn. having a
printk() specific timer_fn, that will do the same, is just a duplication of
functionality; and checking the value in every vprintk_emit() is not really
an option too, I'm afraid, there may be no printk calls for some time.
just my 5 cents. probably you have better ideas.


one another thing, include/linux/workqueue.h says

 : System-wide workqueues which are always present.
 :
 : system_wq is the one used by schedule[_delayed]_work[_on]().
 : Multi-CPU multi-threaded.  There are users which expect relatively
 : short queue flush time.  Don't queue works which can run for too
 : long.
 :
[..]
 :
 : system_long_wq is similar to system_wq but may host long running
 : works.  Queue flushing might take relatively long.
 :
 : system_unbound_wq is unbound workqueue.  Workers are not bound to
 : any specific CPU, not concurrency managed, and all queued works are
 : executed immediately as long as max_active limit is not reached and
 : resources are available.

wake_up_klogd_work_func() is using `system_wq' to do
'console_lock()/console_unlock()', both of which can take a long time.
should it be switched to `system_long_wq' or `system_unbound_wq'?

	-ss

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/7] printk: Hand over printing to console if printing too long
  2016-01-11 12:54   ` Petr Mladek
@ 2016-01-12 14:00     ` Jan Kara
  0 siblings, 0 replies; 28+ messages in thread
From: Jan Kara @ 2016-01-12 14:00 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Jan Kara, Sergey Senozhatsky, Andrew Morton, KY Sri nivasan,
	Steven Rostedt, linux-kernel, Sergey Senozhatsky

On Mon 11-01-16 13:54:34, Petr Mladek wrote:
> On Tue 2015-12-22 14:47:30, Jan Kara wrote:
> > On Thu 10-12-15 23:52:51, Sergey Senozhatsky wrote:
> > > *** in this email and in every later emails ***
> > Over last few days I have worked on redoing the stuff as we
> > discussed with Linus and Andrew at Kernel Summit and I have new patches
> > which are working fine for me. I still want to test them on some machines
> > having real issues with udev during boot but so far stress-testing with
> > serial console slowed down to ~1000 chars/sec on other machines and VMs
> > looks promising.
> > 
> > I'm attaching them in case you want to have a look. They are on top of
> > Tejun's patch adding cond_resched() (which is essential). I'll officially
> > submit the patches once the testing is finished (but I'm not sure when I
> > get to the problematic HW...).
> > 
> > [1] http://www.spinics.net/lists/stable/msg111535.html
> > -- 
> > Jan Kara <jack@suse.com>
> > SUSE Labs, CR
> 
> > >From 2e9675abbfc0df4a24a8c760c58e8150b9a31259 Mon Sep 17 00:00:00 2001
> > From: Jan Kara <jack@suse.cz>
> > Date: Mon, 21 Dec 2015 13:10:31 +0100
> > Subject: [PATCH 1/2] printk: Make printk() completely async
> > 
> > --- a/kernel/printk/printk.c
> > +++ b/kernel/printk/printk.c
> > @@ -1803,10 +1869,24 @@ asmlinkage int vprintk_emit(int facility, int level,
> >  	logbuf_cpu = UINT_MAX;
> >  	raw_spin_unlock(&logbuf_lock);
> >  	lockdep_on();
> > +	/*
> > +	 * By default we print message to console asynchronously so that kernel
> > +	 * doesn't get stalled due to slow serial console. That can lead to
> > +	 * softlockups, lost interrupts, or userspace timing out under heavy
> > +	 * printing load.
> > +	 *
> > +	 * However we resort to synchronous printing of messages during early
> > +	 * boot, when oops is in progress, or when synchronous printing was
> > +	 * explicitely requested by kernel parameter.
> > +	 */
> > +	if (keventd_up() && !oops_in_progress && !sync_print) {
> > +		__this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT);
> > +		irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
> 
> I wonder if it is safe to use sync_print for early messages
> from the scheduler. Well, in this case, we might need to print
> the messages directly from the irq context because the system
> workqueue is not ready yet :-(

Yeah, you are right that early scheduler messages can currently result in a
deadlock. I'll handle this case. Thanks for catching this.

> > +	} else
> > +		sync_print = true;
> >  	local_irq_restore(flags);
> >  
> > -	/* If called from the scheduler, we can not call up(). */
> > -	if (!in_sched) {
> > +	if (sync_print) {
> >  		lockdep_off();
> >  		/*
> >  		 * Disable preemption to avoid being preempted while holding
> 
> > >From be116ae18f15f0d2d05ddf0b53eaac184943d312 Mon Sep 17 00:00:00 2001
> > From: Jan Kara <jack@suse.cz>
> > Date: Mon, 21 Dec 2015 14:26:13 +0100
> > Subject: [PATCH 2/2] printk: Skip messages on oops
> > 
> > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> > index d455d1bd0d2c..fc67ab70e9c7 100644
> > --- a/kernel/printk/printk.c
> > +++ b/kernel/printk/printk.c
> > @@ -262,6 +262,9 @@ static u64 console_seq;
> >  static u32 console_idx;
> >  static enum log_flags console_prev;
> >  
> > +/* current record sequence when oops happened */
> > +static u64 oops_start_seq;
> > +
> >  /* the next printk record to read after the last 'clear' command */
> >  static u64 clear_seq;
> >  static u32 clear_idx;
> > @@ -1783,6 +1786,8 @@ asmlinkage int vprintk_emit(int facility, int level,
> >  					 NULL, 0, recursion_msg,
> >  					 strlen(recursion_msg));
> >  	}
> > +	if (oops_in_progress && !sync_print && !oops_start_seq)
> > +		oops_start_seq = log_next_seq;
> 
> sync_print is false for scheduler messages here. Therefore we might
> skip messages even when user set printk.synchronous on the
> command line.

Ah, right. I'll fix this. Thanks!

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/7] printk: Hand over printing to console if printing too long
  2016-03-01 17:22   ` Denys Vlasenko
@ 2016-03-02  9:30     ` Jan Kara
  0 siblings, 0 replies; 28+ messages in thread
From: Jan Kara @ 2016-03-02  9:30 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: Jan Kara, Andrew Morton, LKML, pmladek, KY Srinivasan,
	Steven Rostedt, Jan Kara

On Tue 01-03-16 18:22:25, Denys Vlasenko wrote:
> On Mon, Oct 26, 2015 at 5:52 AM, Jan Kara <jack@suse.com> wrote:
> > This patch implements a mechanism where after printing specified number
> > of characters (tunable as a kernel parameter printk.offload_chars), CPU
> > doing printing asks for help by waking up one of dedicated kthreads.  As
> > soon as the printing CPU notices kthread got scheduled and is spinning
> > on print_lock dedicated for that purpose, it drops console_sem,
> > print_lock, and exits console_unlock(). Kthread then takes over printing
> > instead. This way no CPU should spend printing too long even if there
> > is heavy printk traffic.
> 
> 
> > +/*
> > + * Number of kernel threads for offloading printing. We need at least two so
> > + * that they can hand over printing from one to another one and thus switch
> > + * CPUs.
> > + */
> > +#define PRINTING_TASKS 2
> > +
> > +/* Wait queue printing kthreads sleep on when idle */
> > +static DECLARE_WAIT_QUEUE_HEAD(print_queue);
> 
> Having two tasks, not one, for printking for the case
> when console output is slow... sounds wasteful.
> 
> Can this be improved so that only one task is needed?

Probably we'll go with workqueue in the next version of the patch series.
But at least in this version you needed two tasks so that one task can hand
over printing to the other one and thus relieve the load from a CPU.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/7] printk: Hand over printing to console if printing too long
  2015-10-26  4:52 ` [PATCH 1/7] printk: Hand over printing to console if printing too long Jan Kara
@ 2016-03-01 17:22   ` Denys Vlasenko
  2016-03-02  9:30     ` Jan Kara
  0 siblings, 1 reply; 28+ messages in thread
From: Denys Vlasenko @ 2016-03-01 17:22 UTC (permalink / raw)
  To: Jan Kara
  Cc: Andrew Morton, LKML, pmladek, KY Srinivasan, Steven Rostedt, Jan Kara

On Mon, Oct 26, 2015 at 5:52 AM, Jan Kara <jack@suse.com> wrote:
> This patch implements a mechanism where after printing specified number
> of characters (tunable as a kernel parameter printk.offload_chars), CPU
> doing printing asks for help by waking up one of dedicated kthreads.  As
> soon as the printing CPU notices kthread got scheduled and is spinning
> on print_lock dedicated for that purpose, it drops console_sem,
> print_lock, and exits console_unlock(). Kthread then takes over printing
> instead. This way no CPU should spend printing too long even if there
> is heavy printk traffic.


> +/*
> + * Number of kernel threads for offloading printing. We need at least two so
> + * that they can hand over printing from one to another one and thus switch
> + * CPUs.
> + */
> +#define PRINTING_TASKS 2
> +
> +/* Wait queue printing kthreads sleep on when idle */
> +static DECLARE_WAIT_QUEUE_HEAD(print_queue);

Having two tasks, not one, for printking for the case
when console output is slow... sounds wasteful.

Can this be improved so that only one task is needed?

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 1/7] printk: Hand over printing to console if printing too long
  2015-10-26  4:52 [PATCH 0/6 v2] printk: Softlockup avoidance Jan Kara
@ 2015-10-26  4:52 ` Jan Kara
  2016-03-01 17:22   ` Denys Vlasenko
  0 siblings, 1 reply; 28+ messages in thread
From: Jan Kara @ 2015-10-26  4:52 UTC (permalink / raw)
  To: Andrew Morton; +Cc: LKML, pmladek, KY Srinivasan, rostedt, Jan Kara

From: Jan Kara <jack@suse.cz>

Currently, console_unlock() prints messages from kernel printk buffer to
console while the buffer is non-empty. When serial console is attached,
printing is slow and thus other CPUs in the system have plenty of time
to append new messages to the buffer while one CPU is printing. Thus the
CPU can spend unbounded amount of time doing printing in console_unlock().
This is especially serious problem if the printk() calling
console_unlock() was called with interrupts disabled.

In practice users have observed a CPU can spend tens of seconds printing
in console_unlock() (usually during boot when hundreds of SCSI devices
are discovered) resulting in RCU stalls (CPU doing printing doesn't
reach quiescent state for a long time), softlockup reports (IPIs for the
printing CPU don't get served and thus other CPUs are spinning waiting
for the printing CPU to process IPIs), and eventually a machine death
(as messages from stalls and lockups append to printk buffer faster than
we are able to print). So these machines are unable to boot with serial
console attached. Also during artificial stress testing SATA disk
disappears from the system because its interrupts aren't served for too
long.

This patch implements a mechanism where after printing specified number
of characters (tunable as a kernel parameter printk.offload_chars), CPU
doing printing asks for help by waking up one of dedicated kthreads.  As
soon as the printing CPU notices kthread got scheduled and is spinning
on print_lock dedicated for that purpose, it drops console_sem,
print_lock, and exits console_unlock(). Kthread then takes over printing
instead. This way no CPU should spend printing too long even if there
is heavy printk traffic.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 Documentation/kernel-parameters.txt |  15 ++++
 kernel/printk/printk.c              | 173 ++++++++++++++++++++++++++++++++----
 2 files changed, 171 insertions(+), 17 deletions(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 22a4b687ea5b..df8adee975ba 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2958,6 +2958,21 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 			Format: <bool>  (1/Y/y=enable, 0/N/n=disable)
 			default: disabled
 
+	printk.offload_chars=
+			Printing to console can be relatively slow especially
+			in case of serial console. When there is intensive
+			printing happening from several cpus (as is the case
+			during boot), a cpu can be spending significant time
+			(seconds or more) doing printing. To avoid softlockups,
+			lost interrupts, and similar problems other cpus
+			will take over printing after the currently printing
+			cpu has printed 'printk.offload_chars' characters.
+			Higher value means possibly longer interrupt and other
+			latencies but lower overhead of printing due to handing
+			over of printing.
+			Format: <number> (0 = disabled)
+			default: 1000
+
 	printk.time=	Show timing data prefixed to each printk message line
 			Format: <bool>  (1/Y/y=enable, 0/N/n=disable)
 
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 8f0324ef72ab..1b26263edfa7 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -46,6 +46,7 @@
 #include <linux/utsname.h>
 #include <linux/ctype.h>
 #include <linux/uio.h>
+#include <linux/kthread.h>
 
 #include <asm/uaccess.h>
 
@@ -78,6 +79,29 @@ static DEFINE_SEMAPHORE(console_sem);
 struct console *console_drivers;
 EXPORT_SYMBOL_GPL(console_drivers);
 
+/*
+ * This spinlock is taken when printing to console. It is used only so that
+ * we can spin on it when some other thread wants to take over printing to
+ * console.
+ */
+static DEFINE_SPINLOCK(print_lock);
+
+/*
+ * Number of printing threads spinning on print_lock. Can go away once
+ * spin_is_contended() is reliable.
+ */
+static atomic_t printing_tasks_spinning = ATOMIC_INIT(0);
+
+/*
+ * Number of kernel threads for offloading printing. We need at least two so
+ * that they can hand over printing from one to another one and thus switch
+ * CPUs.
+ */
+#define PRINTING_TASKS 2
+
+/* Wait queue printing kthreads sleep on when idle */
+static DECLARE_WAIT_QUEUE_HEAD(print_queue);
+
 #ifdef CONFIG_LOCKDEP
 static struct lockdep_map console_lock_dep_map = {
 	.name = "console_lock"
@@ -279,6 +303,18 @@ static u32 clear_idx;
 static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN);
 static char *log_buf = __log_buf;
 static u32 log_buf_len = __LOG_BUF_LEN;
+/*
+ * How many characters can we print in one call of printk before asking
+ * other cpus to continue printing. 0 means infinity. Tunable via
+ * printk.offload_chars kernel parameter. Our default 1000 means about
+ * 0.1s maximum latency due to printing.
+ */
+static unsigned int __read_mostly printk_offload_chars = 1000;
+
+module_param_named(offload_chars, printk_offload_chars, uint,
+		   S_IRUGO | S_IWUSR);
+MODULE_PARM_DESC(offload_chars, "offload printing to console to a different"
+	" cpu after this number of characters");
 
 /* Return log buffer address */
 char *log_buf_addr_get(void)
@@ -2208,15 +2244,41 @@ out:
 	raw_spin_unlock_irqrestore(&logbuf_lock, flags);
 }
 
+/*
+ * Returns true iff there is other cpu waiting to take over printing. This
+ * function also takes are of setting PRINTK_HANDOVER_B if we want to hand over
+ * printing to some other cpu.
+ */
+static bool cpu_stop_printing(int printed_chars)
+{
+	/* Oops? Print everything now to maximize chances user will see it */
+	if (oops_in_progress)
+		return false;
+	if (!printk_offload_chars || printed_chars < printk_offload_chars)
+		return false;
+	/*
+	 * Make sure we load fresh value of printing_tasks_spinning. Matches
+	 * the barrier in printing_task()
+	 */
+	smp_rmb();
+	if (atomic_read(&printing_tasks_spinning))
+		return true;
+	wake_up(&print_queue);
+
+	return false;
+}
+
 /**
  * console_unlock - unlock the console system
  *
  * Releases the console_lock which the caller holds on the console system
  * and the console driver list.
  *
- * While the console_lock was held, console output may have been buffered
- * by printk().  If this is the case, console_unlock(); emits
- * the output prior to releasing the lock.
+ * While the console_lock was held, console output may have been buffered by
+ * printk(). If this is the case, console_unlock() emits the output prior to
+ * releasing the lock. However we need not write all the data in the buffer if
+ * we would hog the CPU for too long. In such case we try to hand over printing
+ * to a different cpu.
  *
  * If there is output waiting, we wake /dev/kmsg and syslog() users.
  *
@@ -2230,6 +2292,8 @@ void console_unlock(void)
 	unsigned long flags;
 	bool wake_klogd = false;
 	bool retry;
+	bool hand_over = false;
+	int printed_chars = 0;
 
 	if (console_suspended) {
 		up_console_sem();
@@ -2241,13 +2305,20 @@ void console_unlock(void)
 	/* flush buffered message fragment immediately to console */
 	console_cont_flush(text, sizeof(text));
 again:
+	retry = false;
+	spin_lock_irqsave(&print_lock, flags);
 	for (;;) {
 		struct printk_log *msg;
 		size_t ext_len = 0;
 		size_t len;
 		int level;
 
-		raw_spin_lock_irqsave(&logbuf_lock, flags);
+		if (cpu_stop_printing(printed_chars)) {
+			hand_over = true;
+			break;
+		}
+
+		raw_spin_lock(&logbuf_lock);
 		if (seen_seq != log_next_seq) {
 			wake_klogd = true;
 			seen_seq = log_next_seq;
@@ -2265,8 +2336,10 @@ again:
 			len = 0;
 		}
 skip:
-		if (console_seq == log_next_seq)
+		if (console_seq == log_next_seq) {
+			raw_spin_unlock(&logbuf_lock);
 			break;
+		}
 
 		msg = log_from_idx(console_idx);
 		if (msg->flags & LOG_NOCONS) {
@@ -2306,28 +2379,38 @@ skip:
 		stop_critical_timings();	/* don't trace print latency */
 		call_console_drivers(level, ext_text, ext_len, text, len);
 		start_critical_timings();
-		local_irq_restore(flags);
+		printed_chars += len;
 	}
-	console_locked = 0;
 
 	/* Release the exclusive_console once it is used */
 	if (unlikely(exclusive_console))
 		exclusive_console = NULL;
 
-	raw_spin_unlock(&logbuf_lock);
-
+	console_locked = 0;
 	up_console_sem();
 
+	if (!hand_over) {
+		/*
+		 * Someone could have filled up the buffer again, so re-check
+		 * if there's something to flush. We perform the check under
+		 * print_lock to save one cli / sti pair in the fast path.
+		 */
+		raw_spin_lock(&logbuf_lock);
+		retry = console_seq != log_next_seq;
+		raw_spin_unlock(&logbuf_lock);
+	}
+
 	/*
-	 * Someone could have filled up the buffer again, so re-check if there's
-	 * something to flush. In case we cannot trylock the console_sem again,
-	 * there's a new owner and the console_unlock() from them will do the
-	 * flush, no worries.
-	 */
-	raw_spin_lock(&logbuf_lock);
-	retry = console_seq != log_next_seq;
-	raw_spin_unlock_irqrestore(&logbuf_lock, flags);
+	 * Release print_lock after console_sem so that printing_task()
+	 * succeeds in getting console_sem (unless someone else takes it and
+	 * then he'll be responsible for printing).
+         */
+	spin_unlock_irqrestore(&print_lock, flags);
 
+	/*
+	 * In case we cannot trylock the console_sem again, there's a new owner
+	 * and the console_unlock() from them will do the flush, no worries.
+	 */
 	if (retry && console_trylock())
 		goto again;
 
@@ -2654,9 +2737,52 @@ int unregister_console(struct console *console)
 }
 EXPORT_SYMBOL(unregister_console);
 
+/* Kthread which takes over printing from a CPU which asks for help */
+static int printing_task(void *arg)
+{
+	unsigned long flags;
+	DEFINE_WAIT(wait);
+
+	while (1) {
+		prepare_to_wait_exclusive(&print_queue, &wait,
+					  TASK_INTERRUPTIBLE);
+		schedule();
+		finish_wait(&print_queue, &wait);
+		if (kthread_should_stop())
+			break;
+		/*
+		 * Disable preemption so that we do not get scheduled away from
+		 * the CPU after we get print_lock and before we are finished
+		 * with printing.
+		 */
+		preempt_disable();
+		atomic_inc(&printing_tasks_spinning);
+		/*
+		 * Store printing_tasks_spinning value before we spin. Matches
+		 * the barrier in cpu_stop_printing().
+		 */
+		smp_mb__after_atomic();
+		/*
+		 * Wait for currently printing thread to complete. We spin on
+		 * print_lock instead of waiting on console_sem since we don't
+		 * want to sleep once we got scheduled to make sure we take
+		 * over printing without depending on the scheduler.
+		 */
+		spin_lock_irqsave(&print_lock, flags);
+		atomic_dec(&printing_tasks_spinning);
+		spin_unlock_irqrestore(&print_lock, flags);
+		if (console_trylock())
+			console_unlock();
+		preempt_enable();
+	}
+	return 0;
+}
+
 static int __init printk_late_init(void)
 {
 	struct console *con;
+	int i;
+	struct task_struct *task;
 
 	for_each_console(con) {
 		if (!keep_bootcon && con->flags & CON_BOOT) {
@@ -2664,6 +2790,19 @@ static int __init printk_late_init(void)
 		}
 	}
 	hotcpu_notifier(console_cpu_notify, 0);
+
+	/* Does any handover of printing have any sence? */
+	if (num_possible_cpus() <= 1)
+		return 0;
+
+	for (i = 0; i < PRINTING_TASKS; i++) {
+		task = kthread_run(printing_task, NULL, "print/%d", i);
+		if (IS_ERR(task)) {
+			pr_err("printk: Cannot create printing thread: %ld\n",
+			       PTR_ERR(task));
+		}
+	}
+
 	return 0;
 }
 late_initcall(printk_late_init);
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2016-03-02  9:30 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-10 14:52 [PATCH 1/7] printk: Hand over printing to console if printing too long Sergey Senozhatsky
2015-12-10 15:24 ` Sergey Senozhatsky
2015-12-11  4:27 ` Sergey Senozhatsky
2015-12-11  6:29   ` Sergey Senozhatsky
2015-12-22 13:47 ` Jan Kara
2015-12-22 14:48   ` Sergey Senozhatsky
2015-12-23  1:54   ` Sergey Senozhatsky
2015-12-23  3:37     ` Sergey Senozhatsky
2015-12-23  3:57       ` Sergey Senozhatsky
2015-12-23  4:15         ` Sergey Senozhatsky
2016-01-05 14:37     ` Jan Kara
2016-01-06  1:41       ` Sergey Senozhatsky
2016-01-06  6:48       ` Sergey Senozhatsky
2016-01-06 12:25         ` Jan Kara
2016-01-11 13:25           ` Sergey Senozhatsky
2015-12-31  2:44   ` Sergey Senozhatsky
2015-12-31  3:13     ` Sergey Senozhatsky
2015-12-31  4:58       ` Sergey Senozhatsky
2016-01-05 14:48         ` Jan Kara
2016-01-06  3:38           ` Sergey Senozhatsky
2016-01-06  8:36             ` Sergey Senozhatsky
2016-01-06 10:21               ` Jan Kara
2016-01-06 11:10                 ` Sergey Senozhatsky
2016-01-11 12:54   ` Petr Mladek
2016-01-12 14:00     ` Jan Kara
  -- strict thread matches above, loose matches on Subject: below --
2015-10-26  4:52 [PATCH 0/6 v2] printk: Softlockup avoidance Jan Kara
2015-10-26  4:52 ` [PATCH 1/7] printk: Hand over printing to console if printing too long Jan Kara
2016-03-01 17:22   ` Denys Vlasenko
2016-03-02  9:30     ` Jan Kara

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.