From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932965AbbLVOuR (ORCPT ); Tue, 22 Dec 2015 09:50:17 -0500 Received: from mail-pf0-f181.google.com ([209.85.192.181]:34496 "EHLO mail-pf0-f181.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752548AbbLVOuO (ORCPT ); Tue, 22 Dec 2015 09:50:14 -0500 Date: Tue, 22 Dec 2015 23:48:38 +0900 From: Sergey Senozhatsky To: Jan Kara Cc: Sergey Senozhatsky , Andrew Morton , Petr Mladek , KY Sri nivasan , Steven Rostedt , linux-kernel@vger.kernel.org, Sergey Senozhatsky Subject: Re: [PATCH 1/7] printk: Hand over printing to console if printing too long Message-ID: <20151222144838.GA611@swordfish> References: <20151210145251.GA540@swordfish> <20151222134730.GD7266@quack.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20151222134730.GD7266@quack.suse.cz> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On (12/22/15 14:47), Jan Kara wrote: [..] > Thanks for looking into my patches and sorry for replying with a delay. As > I wrote in my previous email [1] even the referenced patches are not quite > enough. Over last few days I have worked on redoing the stuff as we > discussed with Linus and Andrew at Kernel Summit and I have new patches > which are working fine for me. I still want to test them on some machines > having real issues with udev during boot but so far stress-testing with > serial console slowed down to ~1000 chars/sec on other machines and VMs > looks promising. > > I'm attaching them in case you want to have a look. They are on top of > Tejun's patch adding cond_resched() (which is essential). I'll officially > submit the patches once the testing is finished (but I'm not sure when I > get to the problematic HW...). > Hello, Thanks a lot! Will take a look. -ss > From 2e9675abbfc0df4a24a8c760c58e8150b9a31259 Mon Sep 17 00:00:00 2001 > From: Jan Kara > Date: Mon, 21 Dec 2015 13:10:31 +0100 > Subject: [PATCH 1/2] printk: Make printk() completely async > > Currently, printk() sometimes waits for message to be printed to console > and sometimes it does not (when console_sem is held by some other > process). In case printk() grabs console_sem and starts printing to > console, it prints messages from kernel printk buffer until the buffer > is empty. When serial console is attached, printing is slow and thus > other CPUs in the system have plenty of time to append new messages to > the buffer while one CPU is printing. Thus the CPU can spend unbounded > amount of time doing printing in console_unlock(). This is especially > serious problem if the printk() calling console_unlock() was called with > interrupts disabled. > > In practice users have observed a CPU can spend tens of seconds printing > in console_unlock() (usually during boot when hundreds of SCSI devices > are discovered) resulting in RCU stalls (CPU doing printing doesn't > reach quiescent state for a long time), softlockup reports (IPIs for the > printing CPU don't get served and thus other CPUs are spinning waiting > for the printing CPU to process IPIs), and eventually a machine death > (as messages from stalls and lockups append to printk buffer faster than > we are able to print). So these machines are unable to boot with serial > console attached. Another observed issue is that due to slow printk, > hardware discovery is slow and udev times out before kernel manages to > discover all the attached HW. Also during artificial stress testing SATA > disk disappears from the system because its interrupts aren't served for > too long. > > This patch makes printk() completely asynchronous (similar to what > printk_deferred() did until now). It appends message to the kernel > printk buffer and queues work to do the printing to console. This has > the advantage that printing always happens from a schedulable contex and > thus we don't lockup any particular CPU or even interrupts. Also it has > the advantage that printk() is fast and thus kernel booting is not > slowed down by slow serial console. Disadvantage of this method is that > in case of crash there is higher chance that important messages won't > appear in console output (we may need working scheduling to print > message to console). We somewhat mitigate this risk by switching printk > to the original method of immediate printing to console if oops is in > progress. Also for debugging purposes we provide printk.synchronous > kernel parameter which resorts to the original printk behavior. > > Signed-off-by: Jan Kara > --- > Documentation/kernel-parameters.txt | 10 +++ > kernel/printk/printk.c | 144 +++++++++++++++++++++--------------- > 2 files changed, 95 insertions(+), 59 deletions(-) > > diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt > index 742f69d18fc8..4cf1bddeffc7 100644 > --- a/Documentation/kernel-parameters.txt > +++ b/Documentation/kernel-parameters.txt > @@ -3000,6 +3000,16 @@ bytes respectively. Such letter suffixes can also be entirely omitted. > printk.time= Show timing data prefixed to each printk message line > Format: (1/Y/y=enable, 0/N/n=disable) > > + printk.synchronous= > + By default kernel messages are printed to console > + asynchronously (except during early boot or when oops > + is happening). That avoids kernel stalling behind slow > + serial console and thus avoids softlockups, interrupt > + timeouts, or userspace timing out during heavy printing. > + However for debugging problems, printing messages to > + console immediately may be desirable. This option > + enables such behavior. > + > processor.max_cstate= [HW,ACPI] > Limit processor to maximum C-state > max_cstate=9 overrides any DMI blacklist limit. > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c > index 299c2f0e7350..d455d1bd0d2c 100644 > --- a/kernel/printk/printk.c > +++ b/kernel/printk/printk.c > @@ -283,6 +283,73 @@ static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN); > static char *log_buf = __log_buf; > static u32 log_buf_len = __LOG_BUF_LEN; > > +/* > + * When true, printing to console will happen synchronously unless someone else > + * is already printing messages. > + */ > +static bool __read_mostly printk_sync; > +module_param_named(synchronous, printk_sync, bool, S_IRUGO | S_IWUSR); > +MODULE_PARM_DESC(synchronous, "make printing to console synchronous"); > + > +#define PRINTK_PENDING_WAKEUP 0x01 > +#define PRINTK_PENDING_OUTPUT 0x02 > + > +static DEFINE_PER_CPU(int, printk_pending); > + > +static void printing_work_func(struct work_struct *work) > +{ > + console_lock(); > + console_unlock(); > +} > + > +static DECLARE_WORK(printing_work, printing_work_func); > + > +static void wake_up_klogd_work_func(struct irq_work *irq_work) > +{ > + int pending = __this_cpu_xchg(printk_pending, 0); > + > + /* > + * We just schedule regular work to do the printing from irq work. We > + * don't want to do printing here directly as that happens with > + * interrupts disabled and thus is bad for interrupt latency. We also > + * don't want to queue regular work from vprintk_emit() as that gets > + * called in various difficult contexts where schedule_work() could > + * deadlock. > + */ > + if (pending & PRINTK_PENDING_OUTPUT) > + schedule_work(&printing_work); > + > + if (pending & PRINTK_PENDING_WAKEUP) > + wake_up_interruptible(&log_wait); > +} > + > +static DEFINE_PER_CPU(struct irq_work, wake_up_klogd_work) = { > + .func = wake_up_klogd_work_func, > + .flags = IRQ_WORK_LAZY, > +}; > + > +void wake_up_klogd(void) > +{ > + preempt_disable(); > + if (waitqueue_active(&log_wait)) { > + this_cpu_or(printk_pending, PRINTK_PENDING_WAKEUP); > + irq_work_queue(this_cpu_ptr(&wake_up_klogd_work)); > + } > + preempt_enable(); > +} > + > +int printk_deferred(const char *fmt, ...) > +{ > + va_list args; > + int r; > + > + va_start(args, fmt); > + r = vprintk_emit(0, LOGLEVEL_SCHED, NULL, 0, fmt, args); > + va_end(args); > + > + return r; > +} > + > /* Return log buffer address */ > char *log_buf_addr_get(void) > { > @@ -1668,15 +1735,14 @@ asmlinkage int vprintk_emit(int facility, int level, > unsigned long flags; > int this_cpu; > int printed_len = 0; > - bool in_sched = false; > + bool sync_print = printk_sync; > /* cpu currently holding logbuf_lock in this function */ > static unsigned int logbuf_cpu = UINT_MAX; > > if (level == LOGLEVEL_SCHED) { > level = LOGLEVEL_DEFAULT; > - in_sched = true; > + sync_print = false; > } > - > boot_delay_msec(level); > printk_delay(); > > @@ -1803,10 +1869,24 @@ asmlinkage int vprintk_emit(int facility, int level, > logbuf_cpu = UINT_MAX; > raw_spin_unlock(&logbuf_lock); > lockdep_on(); > + /* > + * By default we print message to console asynchronously so that kernel > + * doesn't get stalled due to slow serial console. That can lead to > + * softlockups, lost interrupts, or userspace timing out under heavy > + * printing load. > + * > + * However we resort to synchronous printing of messages during early > + * boot, when oops is in progress, or when synchronous printing was > + * explicitely requested by kernel parameter. > + */ > + if (keventd_up() && !oops_in_progress && !sync_print) { > + __this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT); > + irq_work_queue(this_cpu_ptr(&wake_up_klogd_work)); > + } else > + sync_print = true; > local_irq_restore(flags); > > - /* If called from the scheduler, we can not call up(). */ > - if (!in_sched) { > + if (sync_print) { > lockdep_off(); > /* > * Disable preemption to avoid being preempted while holding > @@ -2688,60 +2768,6 @@ late_initcall(printk_late_init); > > #if defined CONFIG_PRINTK > /* > - * Delayed printk version, for scheduler-internal messages: > - */ > -#define PRINTK_PENDING_WAKEUP 0x01 > -#define PRINTK_PENDING_OUTPUT 0x02 > - > -static DEFINE_PER_CPU(int, printk_pending); > - > -static void wake_up_klogd_work_func(struct irq_work *irq_work) > -{ > - int pending = __this_cpu_xchg(printk_pending, 0); > - > - if (pending & PRINTK_PENDING_OUTPUT) { > - /* If trylock fails, someone else is doing the printing */ > - if (console_trylock()) > - console_unlock(); > - } > - > - if (pending & PRINTK_PENDING_WAKEUP) > - wake_up_interruptible(&log_wait); > -} > - > -static DEFINE_PER_CPU(struct irq_work, wake_up_klogd_work) = { > - .func = wake_up_klogd_work_func, > - .flags = IRQ_WORK_LAZY, > -}; > - > -void wake_up_klogd(void) > -{ > - preempt_disable(); > - if (waitqueue_active(&log_wait)) { > - this_cpu_or(printk_pending, PRINTK_PENDING_WAKEUP); > - irq_work_queue(this_cpu_ptr(&wake_up_klogd_work)); > - } > - preempt_enable(); > -} > - > -int printk_deferred(const char *fmt, ...) > -{ > - va_list args; > - int r; > - > - preempt_disable(); > - va_start(args, fmt); > - r = vprintk_emit(0, LOGLEVEL_SCHED, NULL, 0, fmt, args); > - va_end(args); > - > - __this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT); > - irq_work_queue(this_cpu_ptr(&wake_up_klogd_work)); > - preempt_enable(); > - > - return r; > -} > - > -/* > * printk rate limiting, lifted from the networking subsystem. > * > * This enforces a rate limit: not more than 10 kernel messages > -- > 2.6.2 > > From be116ae18f15f0d2d05ddf0b53eaac184943d312 Mon Sep 17 00:00:00 2001 > From: Jan Kara > Date: Mon, 21 Dec 2015 14:26:13 +0100 > Subject: [PATCH 2/2] printk: Skip messages on oops > > When there are too many messages in the kernel printk buffer it can take > very long to print them to console (especially when using slow serial > console). This is undesirable during oops so when we encounter oops and > there are more than 100 messages to print, print just the newest 100 > messages and then the oops message. > > Signed-off-by: Jan Kara > --- > kernel/printk/printk.c | 34 +++++++++++++++++++++++++++++++++- > 1 file changed, 33 insertions(+), 1 deletion(-) > > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c > index d455d1bd0d2c..fc67ab70e9c7 100644 > --- a/kernel/printk/printk.c > +++ b/kernel/printk/printk.c > @@ -262,6 +262,9 @@ static u64 console_seq; > static u32 console_idx; > static enum log_flags console_prev; > > +/* current record sequence when oops happened */ > +static u64 oops_start_seq; > + > /* the next printk record to read after the last 'clear' command */ > static u64 clear_seq; > static u32 clear_idx; > @@ -1783,6 +1786,8 @@ asmlinkage int vprintk_emit(int facility, int level, > NULL, 0, recursion_msg, > strlen(recursion_msg)); > } > + if (oops_in_progress && !sync_print && !oops_start_seq) > + oops_start_seq = log_next_seq; > > /* > * The printf needs to come first; we need the syslog > @@ -2292,6 +2297,12 @@ out: > raw_spin_unlock_irqrestore(&logbuf_lock, flags); > } > > +/* > + * When oops happens and there are more messages to be printed in the printk > + * buffer that this, skip some mesages and print only this many newest messages. > + */ > +#define PRINT_MSGS_BEFORE_OOPS 100 > + > /** > * console_unlock - unlock the console system > * > @@ -2348,7 +2359,28 @@ again: > seen_seq = log_next_seq; > } > > - if (console_seq < log_first_seq) { > + /* > + * If oops happened and there are more than > + * PRINT_MSGS_BEFORE_OOPS messages pending before oops message, > + * skip them to make oops appear faster. > + */ > + if (oops_start_seq && > + console_seq + PRINT_MSGS_BEFORE_OOPS < oops_start_seq) { > + len = sprintf(text, > + "** %u printk messages dropped due to oops ** ", > + (unsigned)(oops_start_seq - console_seq - > + PRINT_MSGS_BEFORE_OOPS)); > + if (console_seq < log_first_seq) { > + console_seq = log_first_seq; > + console_idx = log_first_idx; > + } > + while (console_seq < > + oops_start_seq - PRINT_MSGS_BEFORE_OOPS) { > + console_idx = log_next(console_idx); > + console_seq++; > + } > + console_prev = 0; > + } else if (console_seq < log_first_seq) { > len = sprintf(text, "** %u printk messages dropped ** ", > (unsigned)(log_first_seq - console_seq)); > > -- > 2.6.2 >