From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756226AbcCaLMk (ORCPT ); Thu, 31 Mar 2016 07:12:40 -0400 Received: from mx2.suse.de ([195.135.220.15]:53515 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750971AbcCaLMe (ORCPT ); Thu, 31 Mar 2016 07:12:34 -0400 Date: Thu, 31 Mar 2016 13:12:29 +0200 From: Petr Mladek To: Sergey Senozhatsky Cc: Andrew Morton , Jan Kara , Tejun Heo , Tetsuo Handa , linux-kernel@vger.kernel.org, Byungchul Park , Sergey Senozhatsky , Jan Kara Subject: Re: [RFC][PATCH v8 1/2] printk: Make printk() completely async Message-ID: <20160331111229.GB1023@pathway.suse.cz> References: <1458834203-3392-1-git-send-email-sergey.senozhatsky@gmail.com> <1458834203-3392-2-git-send-email-sergey.senozhatsky@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1458834203-3392-2-git-send-email-sergey.senozhatsky@gmail.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri 2016-03-25 00:43:22, Sergey Senozhatsky wrote: > From: Jan Kara > > Currently, printk() sometimes waits for message to be printed to console > and sometimes it does not (when console_sem is held by some other > process). In case printk() grabs console_sem and starts printing to > console, it prints messages from kernel printk buffer until the buffer > is empty. When serial console is attached, printing is slow and thus > other CPUs in the system have plenty of time to append new messages to > the buffer while one CPU is printing. Thus the CPU can spend unbounded > amount of time doing printing in console_unlock(). This is especially > serious problem if the printk() calling console_unlock() was called with > interrupts disabled. > > In practice users have observed a CPU can spend tens of seconds printing > in console_unlock() (usually during boot when hundreds of SCSI devices > are discovered) resulting in RCU stalls (CPU doing printing doesn't > reach quiescent state for a long time), softlockup reports (IPIs for the > printing CPU don't get served and thus other CPUs are spinning waiting > for the printing CPU to process IPIs), and eventually a machine death > (as messages from stalls and lockups append to printk buffer faster than > we are able to print). So these machines are unable to boot with serial > console attached. Another observed issue is that due to slow printk, > hardware discovery is slow and udev times out before kernel manages to > discover all the attached HW. Also during artificial stress testing SATA > disk disappears from the system because its interrupts aren't served for > too long. > > This patch makes printk() completely asynchronous (similar to what > printk_deferred() did until now). It appends message to the kernel > printk buffer and wake_up()s a special dedicated kthread to do the > printing to console. This has the advantage that printing always happens > from a schedulable contex and thus we don't lockup any particular CPU or > even interrupts. Also it has the advantage that printk() is fast and > thus kernel booting is not slowed down by slow serial console. > Disadvantage of this method is that in case of crash there is higher > chance that important messages won't appear in console output (we may > need working scheduling to print message to console). We somewhat > mitigate this risk by switching printk to the original method of > immediate printing to console if oops is in progress. Also for > debugging purposes we provide printk.synchronous kernel parameter which > resorts to the original printk behavior. > > printk() is expected to work under different conditions and in different > scenarios, including corner cases of OOM when all of the workers are busy > (e.g. allocating memory), thus printk() uses its own dedicated printing > kthread, rather than relying on workqueue (even with WQ_MEM_RECLAIM bit > set we potentially can receive delays in printing until workqueue > declares a ->mayday, as noted by Tetsuo Handa). > > Signed-off-by: Jan Kara > Signed-off-by: Sergey Senozhatsky > --- > Documentation/kernel-parameters.txt | 10 ++++ > kernel/printk/printk.c | 91 ++++++++++++++++++++++++++++++++++--- > 2 files changed, 94 insertions(+), 7 deletions(-) > > diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt > index ecc74fa..4745e94 100644 > --- a/Documentation/kernel-parameters.txt > +++ b/Documentation/kernel-parameters.txt > @@ -3114,6 +3114,16 @@ bytes respectively. Such letter suffixes can also be entirely omitted. > printk.time= Show timing data prefixed to each printk message line > Format: (1/Y/y=enable, 0/N/n=disable) > > + printk.synchronous= > + By default kernel messages are printed to console > + asynchronously (except during early boot or when oops > + is happening). That avoids kernel stalling behind slow > + serial console and thus avoids softlockups, interrupt > + timeouts, or userspace timing out during heavy printing. > + However for debugging problems, printing messages to > + console immediately may be desirable. This option > + enables such behavior. > + > processor.max_cstate= [HW,ACPI] > Limit processor to maximum C-state > max_cstate=9 overrides any DMI blacklist limit. > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c > index bfbf284..a984c5a 100644 > --- a/kernel/printk/printk.c > +++ b/kernel/printk/printk.c > @@ -46,6 +46,7 @@ > #include > #include > #include > +#include > > #include > #include > @@ -284,6 +285,19 @@ static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN); > static char *log_buf = __log_buf; > static u32 log_buf_len = __LOG_BUF_LEN; > > +/* > + * When true, printing to console will happen synchronously. > + * The default value on UP systems is 'true'. > + */ > +static bool __read_mostly printk_sync = !IS_ENABLED(CONFIG_SMP); > +module_param_named(synchronous, printk_sync, bool, S_IRUGO); > +MODULE_PARM_DESC(synchronous, "make printing to console synchronous"); > + > +/* Printing kthread for async printk */ > +static struct task_struct *printk_kthread; > +/* When `true' - printing thread has messages to print */ > +static bool need_flush_console; > + > /* Return log buffer address */ > char *log_buf_addr_get(void) > { > @@ -1608,6 +1622,8 @@ asmlinkage int vprintk_emit(int facility, int level, > const char *dict, size_t dictlen, > const char *fmt, va_list args) > { > + /* cpu currently holding logbuf_lock in this function */ > + static unsigned int logbuf_cpu = UINT_MAX; > static bool recursion_bug; > static char textbuf[LOG_LINE_MAX]; > char *text = textbuf; > @@ -1617,8 +1633,7 @@ asmlinkage int vprintk_emit(int facility, int level, > int this_cpu; > int printed_len = 0; > bool in_sched = false; > - /* cpu currently holding logbuf_lock in this function */ > - static unsigned int logbuf_cpu = UINT_MAX; > + bool in_panic = console_loglevel == CONSOLE_LOGLEVEL_MOTORMOUTH; > > if (level == LOGLEVEL_SCHED) { > level = LOGLEVEL_DEFAULT; > @@ -1655,6 +1670,14 @@ asmlinkage int vprintk_emit(int facility, int level, > raw_spin_lock(&logbuf_lock); > logbuf_cpu = this_cpu; > > + /* > + * Set printing kthread sleep condition early, under the > + * logbuf_lock, so it (if RUNNING) will go to console_lock() > + * and spin on logbuf_lock. > + */ > + if (!in_panic && printk_kthread && !need_flush_console) > + need_flush_console = true; I would remove the if-statement and always set it: + It does not matter if we set it in panic. It will not affect anything. + The check for printk_kthread is racy. It might be false here and it might be true later when check whether to wakeup the kthread or try to get console directly. + We might set it true even when it was true before. I think that this was an attempt to avoid a spurious wake up. But we solve it better way now. > if (unlikely(recursion_bug)) { > static const char recursion_msg[] = > "BUG: recent printk recursion!"; > @@ -1757,12 +1780,28 @@ asmlinkage int vprintk_emit(int facility, int level, > if (!in_sched) { > lockdep_off(); > /* > - * Try to acquire and then immediately release the console > - * semaphore. The release will print out buffers and wake up > - * /dev/kmsg and syslog() users. > + * By default we print message to console asynchronously so > + * that kernel doesn't get stalled due to slow serial console. > + * That can lead to softlockups, lost interrupts, or userspace > + * timing out under heavy printing load. > + * > + * However we resort to synchronous printing of messages during > + * early boot, when synchronous printing was explicitly > + * requested by kernel parameter, or when console_verbose() was > + * called to print everything during panic / oops. > */ > - if (console_trylock()) > - console_unlock(); > + if (!in_panic && printk_kthread) { > + /* Offload printing to a schedulable context. */ > + wake_up_process(printk_kthread); > + } else { > + /* > + * Try to acquire and then immediately release the > + * console semaphore. The release will print out > + * buffers and wake up /dev/kmsg and syslog() users. > + */ > + if (console_trylock()) > + console_unlock(); > + } > lockdep_on(); > } > > @@ -1899,6 +1938,8 @@ static size_t cont_print_text(char *text, size_t size) { return 0; } > /* Still needs to be defined for users */ > DEFINE_PER_CPU(printk_func_t, printk_func); > > +static struct task_struct *printk_kthread; > +static bool need_flush_console; > #endif /* CONFIG_PRINTK */ > > #ifdef CONFIG_EARLY_PRINTK > @@ -2337,6 +2378,8 @@ skip: > */ > raw_spin_lock(&logbuf_lock); > retry = console_seq != log_next_seq; > + if (!retry && printk_kthread) > + need_flush_console = false; Similar here. I remove the if-statement and always set it. We will either retry or it should be false anyway. > raw_spin_unlock_irqrestore(&logbuf_lock, flags); > > if (retry && console_trylock()) > @@ -2722,6 +2765,40 @@ static int __init printk_late_init(void) > late_initcall(printk_late_init); > > #if defined CONFIG_PRINTK > +static int printk_kthread_func(void *data) > +{ > + while (1) { > + set_current_state(TASK_INTERRUPTIBLE); > + if (!need_flush_console) > + schedule(); > + > + __set_current_state(TASK_RUNNING); We still must do here: need_flush_console = false; Othrerwise, we might start "busy" cycling. cosole_unlock() sometimes returns earlly, e.g. when console_suspended is set or !can_use_console() returns true. Sigh, the handling of "need_flush_console" is a bit strange. Part of the logic depends on logbuf_lock and the other part must be lockless. I think that this lockless assigment deserve a comment, e.g. /* * Avoid an infinite loop when console_unlock() cannot * access consoles, e.g. because of a suspend. We * could get asleep here. Someone else will call * consoles if conditions change. */ Also another name might help. If we set it false here, the value does describe a global state. The variable describes if this kthread needs to flush the console. So, more precise name would be printk_kthread_need_flush_console I think that we are close. I really like the current state of the patch and how minimalistic it is. Best Regards, Petr