From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751292AbcCFGbj (ORCPT ); Sun, 6 Mar 2016 01:31:39 -0500 Received: from mail-pa0-f54.google.com ([209.85.220.54]:32920 "EHLO mail-pa0-f54.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751132AbcCFGbc (ORCPT ); Sun, 6 Mar 2016 01:31:32 -0500 Date: Sun, 6 Mar 2016 15:32:51 +0900 From: Sergey Senozhatsky To: Andrew Morton Cc: Jan Kara , Petr Mladek , Tejun Heo , Tetsuo Handa , linux-kernel@vger.kernel.org, Sergey Senozhatsky , Sergey Senozhatsky , Jan Kara Subject: Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async Message-ID: <20160306063251.GA493@swordfish> References: <1457175338-1665-1-git-send-email-sergey.senozhatsky@gmail.com> <1457175338-1665-2-git-send-email-sergey.senozhatsky@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1457175338-1665-2-git-send-email-sergey.senozhatsky@gmail.com> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On (03/05/16 19:55), Sergey Senozhatsky wrote: [..] > +static int __init init_printk_workqueue(void) > +{ > + if (printk_sync) > + return 0; > + > + printk_wq = alloc_workqueue("printk_wq", WQ_MEM_RECLAIM, 0); > + /* > + * Fallback to one of system-wide workqueues if we failed to > + * allocate a dedicated one. console_unlock() may take some > + * time to flush all of the messages, so we better use > + * system_long_wq. > + */ > + if (!printk_wq) { > + pr_err("printk: Fallback to system_long workqueue\n"); > + printk_wq = system_long_wq; > + } > + > + return 0; > +} > +early_initcall(init_printk_workqueue); ^^^^^^ better be late_initcall(). init_workqueues(), which creates pwq_cache kmem_cache, is early initcall. ===8<===8<===8<==== >>From d9cf33958febd863ceb25c59d447aa9500427c30 Mon Sep 17 00:00:00 2001 From: Jan Kara Subject: [PATCH 1/2] printk: Make printk() completely async Currently, printk() sometimes waits for message to be printed to console and sometimes it does not (when console_sem is held by some other process). In case printk() grabs console_sem and starts printing to console, it prints messages from kernel printk buffer until the buffer is empty. When serial console is attached, printing is slow and thus other CPUs in the system have plenty of time to append new messages to the buffer while one CPU is printing. Thus the CPU can spend unbounded amount of time doing printing in console_unlock(). This is especially serious problem if the printk() calling console_unlock() was called with interrupts disabled. In practice users have observed a CPU can spend tens of seconds printing in console_unlock() (usually during boot when hundreds of SCSI devices are discovered) resulting in RCU stalls (CPU doing printing doesn't reach quiescent state for a long time), softlockup reports (IPIs for the printing CPU don't get served and thus other CPUs are spinning waiting for the printing CPU to process IPIs), and eventually a machine death (as messages from stalls and lockups append to printk buffer faster than we are able to print). So these machines are unable to boot with serial console attached. Another observed issue is that due to slow printk, hardware discovery is slow and udev times out before kernel manages to discover all the attached HW. Also during artificial stress testing SATA disk disappears from the system because its interrupts aren't served for too long. This patch makes printk() completely asynchronous (similar to what printk_deferred() did until now). It appends message to the kernel printk buffer and queues work to do the printing to console. This has the advantage that printing always happens from a schedulable contex and thus we don't lockup any particular CPU or even interrupts. Also it has the advantage that printk() is fast and thus kernel booting is not slowed down by slow serial console. Disadvantage of this method is that in case of crash there is higher chance that important messages won't appear in console output (we may need working scheduling to print message to console). We somewhat mitigate this risk by switching printk to the original method of immediate printing to console if oops is in progress. Also for debugging purposes we provide printk.synchronous kernel parameter which resorts to the original printk behavior. printk() is expected to work under different conditions and in different scenarios, including corner cases of OOM when all of the workers are busy (e.g. allocating memory). Thus by default printk() uses its own dedicated workqueue with WQ_MEM_RECLAIM bit set. It falls back to system_long_wq (console_unlock() is time unbound) only if it has failed to allocate printk_wq. Another thing to mention, is that deferred printk() messages may appear before printk_wq is allocated, so in the very beginning we have to printk deferred messages the old way -- in IRQ context. Signed-off-by: Jan Kara Signed-off-by: Sergey Senozhatsky --- Documentation/kernel-parameters.txt | 10 ++ kernel/printk/printk.c | 194 +++++++++++++++++++++++++----------- 2 files changed, 146 insertions(+), 58 deletions(-) diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index e0a21e4..5b01941 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -3108,6 +3108,16 @@ bytes respectively. Such letter suffixes can also be entirely omitted. printk.time= Show timing data prefixed to each printk message line Format: (1/Y/y=enable, 0/N/n=disable) + printk.synchronous= + By default kernel messages are printed to console + asynchronously (except during early boot or when oops + is happening). That avoids kernel stalling behind slow + serial console and thus avoids softlockups, interrupt + timeouts, or userspace timing out during heavy printing. + However for debugging problems, printing messages to + console immediately may be desirable. This option + enables such behavior. + processor.max_cstate= [HW,ACPI] Limit processor to maximum C-state max_cstate=9 overrides any DMI blacklist limit. diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index d5fd844..d287fd7 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -46,6 +46,7 @@ #include #include #include +#include #include #include @@ -284,6 +285,108 @@ static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN); static char *log_buf = __log_buf; static u32 log_buf_len = __LOG_BUF_LEN; +/* + * When true, printing to console will happen synchronously unless someone else + * is already printing messages. + */ +static bool __read_mostly printk_sync; +module_param_named(synchronous, printk_sync, bool, S_IRUGO | S_IWUSR); +MODULE_PARM_DESC(synchronous, "make printing to console synchronous"); + +static struct workqueue_struct *printk_wq; + +#define PRINTK_PENDING_WAKEUP (1<<0) +#define PRINTK_PENDING_IRQ_OUTPUT (1<<1) +#define PRINTK_PENDING_WQ_OUTPUT (1<<2) + +static DEFINE_PER_CPU(int, printk_pending); + +static int __init init_printk_workqueue(void) +{ + if (printk_sync) + return 0; + + printk_wq = alloc_workqueue("printk_wq", WQ_MEM_RECLAIM, 0); + /* + * Fallback to one of system-wide workqueues if we failed to + * allocate a dedicated one. console_unlock() may take some + * time to flush all of the messages, so we better use + * system_long_wq. + */ + if (!printk_wq) { + pr_err("printk: Fallback to system_long workqueue\n"); + printk_wq = system_long_wq; + } + + return 0; +} +late_initcall(init_printk_workqueue); + +static void printing_work_func(struct work_struct *work) +{ + console_lock(); + console_unlock(); +} + +static DECLARE_WORK(printing_work, printing_work_func); + +static void wake_up_klogd_work_func(struct irq_work *irq_work) +{ + int pending = __this_cpu_xchg(printk_pending, 0); + + /* + * We just schedule regular work to do the printing from irq work. We + * don't want to do printing here directly as that happens with + * interrupts disabled and thus is bad for interrupt latency. We also + * don't want to queue regular work from vprintk_emit() as that gets + * called in various difficult contexts where schedule_work() could + * deadlock. + */ + if (pending & PRINTK_PENDING_WQ_OUTPUT) + queue_work(printk_wq, &printing_work); + + /* + * Deferred printk messages may come too early when we don't have + * workqueue allocated yet. We can't print deferred messages directly, + * because this may deadlock, so in the very beginning we have to + * print deferred messages from IRQ context. + */ + if (pending & PRINTK_PENDING_IRQ_OUTPUT) { + if (console_trylock()) + console_unlock(); + } + + if (pending & PRINTK_PENDING_WAKEUP) + wake_up_interruptible(&log_wait); +} + +static DEFINE_PER_CPU(struct irq_work, wake_up_klogd_work) = { + .func = wake_up_klogd_work_func, + .flags = IRQ_WORK_LAZY, +}; + +void wake_up_klogd(void) +{ + preempt_disable(); + if (waitqueue_active(&log_wait)) { + this_cpu_or(printk_pending, PRINTK_PENDING_WAKEUP); + irq_work_queue(this_cpu_ptr(&wake_up_klogd_work)); + } + preempt_enable(); +} + +int printk_deferred(const char *fmt, ...) +{ + va_list args; + int r; + + va_start(args, fmt); + r = vprintk_emit(0, LOGLEVEL_SCHED, NULL, 0, fmt, args); + va_end(args); + + return r; +} + /* Return log buffer address */ char *log_buf_addr_get(void) { @@ -1609,6 +1712,8 @@ asmlinkage int vprintk_emit(int facility, int level, const char *dict, size_t dictlen, const char *fmt, va_list args) { + /* cpu currently holding logbuf_lock in this function */ + static unsigned int logbuf_cpu = UINT_MAX; static bool recursion_bug; static char textbuf[LOG_LINE_MAX]; char *text = textbuf; @@ -1618,12 +1723,13 @@ asmlinkage int vprintk_emit(int facility, int level, int this_cpu; int printed_len = 0; bool in_sched = false; - /* cpu currently holding logbuf_lock in this function */ - static unsigned int logbuf_cpu = UINT_MAX; + bool in_panic = console_loglevel == CONSOLE_LOGLEVEL_MOTORMOUTH; + bool sync_print = printk_sync; if (level == LOGLEVEL_SCHED) { level = LOGLEVEL_DEFAULT; in_sched = true; + sync_print = false; } boot_delay_msec(level); @@ -1752,10 +1858,36 @@ asmlinkage int vprintk_emit(int facility, int level, logbuf_cpu = UINT_MAX; raw_spin_unlock(&logbuf_lock); lockdep_on(); + /* + * By default we print message to console asynchronously so that kernel + * doesn't get stalled due to slow serial console. That can lead to + * softlockups, lost interrupts, or userspace timing out under heavy + * printing load. + * + * However we resort to synchronous printing of messages during early + * boot, when synchronous printing was explicitly requested by + * kernel parameter, or when console_verbose() was called to print + * everything during panic / oops. + * + * Deferred messages (in_sched), however, must not be printed + * synchronously. + */ + if (!sync_print) { + if (printk_wq && !in_panic) { + __this_cpu_or(printk_pending, + PRINTK_PENDING_WQ_OUTPUT); + irq_work_queue(this_cpu_ptr(&wake_up_klogd_work)); + } else if (in_sched) { + __this_cpu_or(printk_pending, + PRINTK_PENDING_IRQ_OUTPUT); + irq_work_queue(this_cpu_ptr(&wake_up_klogd_work)); + } else { + sync_print = true; + } + } local_irq_restore(flags); - /* If called from the scheduler, we can not call up(). */ - if (!in_sched) { + if (sync_print) { lockdep_off(); /* * Try to acquire and then immediately release the console @@ -2724,60 +2856,6 @@ late_initcall(printk_late_init); #if defined CONFIG_PRINTK /* - * Delayed printk version, for scheduler-internal messages: - */ -#define PRINTK_PENDING_WAKEUP 0x01 -#define PRINTK_PENDING_OUTPUT 0x02 - -static DEFINE_PER_CPU(int, printk_pending); - -static void wake_up_klogd_work_func(struct irq_work *irq_work) -{ - int pending = __this_cpu_xchg(printk_pending, 0); - - if (pending & PRINTK_PENDING_OUTPUT) { - /* If trylock fails, someone else is doing the printing */ - if (console_trylock()) - console_unlock(); - } - - if (pending & PRINTK_PENDING_WAKEUP) - wake_up_interruptible(&log_wait); -} - -static DEFINE_PER_CPU(struct irq_work, wake_up_klogd_work) = { - .func = wake_up_klogd_work_func, - .flags = IRQ_WORK_LAZY, -}; - -void wake_up_klogd(void) -{ - preempt_disable(); - if (waitqueue_active(&log_wait)) { - this_cpu_or(printk_pending, PRINTK_PENDING_WAKEUP); - irq_work_queue(this_cpu_ptr(&wake_up_klogd_work)); - } - preempt_enable(); -} - -int printk_deferred(const char *fmt, ...) -{ - va_list args; - int r; - - preempt_disable(); - va_start(args, fmt); - r = vprintk_emit(0, LOGLEVEL_SCHED, NULL, 0, fmt, args); - va_end(args); - - __this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT); - irq_work_queue(this_cpu_ptr(&wake_up_klogd_work)); - preempt_enable(); - - return r; -} - -/* * printk rate limiting, lifted from the networking subsystem. * * This enforces a rate limit: not more than 10 kernel messages -- 2.8.0.rc0.1.gd285ab0