* [PATCH v10 0/2] printk: Make printk() completely async @ 2016-04-04 16:57 Sergey Senozhatsky 2016-04-04 16:57 ` [PATCH v10 1/2] " Sergey Senozhatsky 2016-04-04 16:57 ` [PATCH v10 2/2] printk: Make wake_up_klogd_work_func() async Sergey Senozhatsky 0 siblings, 2 replies; 39+ messages in thread From: Sergey Senozhatsky @ 2016-04-04 16:57 UTC (permalink / raw) To: Andrew Morton Cc: Jan Kara, Petr Mladek, Tejun Heo, Tetsuo Handa, linux-kernel, Byungchul Park, Sergey Senozhatsky, Sergey Senozhatsky Hello, This patch set makes printk() completely asynchronous: new messages are getting upended to the kernel printk buffer, but instead of 'direct' printing the actual print job is performed by a dedicated kthread. This has the advantage that printing always happens from a schedulable context and thus we don't lockup any particular CPU or even interrupts. against next-20160404 v10: -- simplify printk_kthread_need_flush_console (Jan, Petr) v9: -- move need_flush_console assignment down in vprintk_emit (Jan) -- simplify need_flush_console assignment rules (Petr) -- clear need_flush_console in printing function (Petr) -- rename need_flush_console (Petr) v8: -- rename kthread printing function (Petr) -- clear need_flush_console in console_unlock() under logbuf (Petr) v7: -- do not set global printk_sync in panic in vrintk_emit() (Petr) -- simplify vprintk_emit(). drop some of local variables (Petr) -- move handling of LOGLEVEL_SCHED messages back to printk_deferred() so we wake_up_process()/console_trylock() in vprintk_emit() only for !in_sched messages v6: -- move wake_up_process out of logbuf lock (Jan, Byungchul) -- do not disable async printk in recursion handling code. -- rebase against next-20160321 (w/NMI patches) v5: -- make printk.synchronous RO (Petr) -- make printing_func() correct and do not use wait_queue (Petr) -- do not panic() when can't allocate printing thread (Petr) -- do not wake_up_process() only in IRQ, prefer vprintk_emit() (Jan) -- move wake_up_klogd_work_func() to a separate patch (Petr) -- move wake_up_process() under logbuf lock so printk recursion logic can help us out -- switch to sync_print mode if printk recursion occured -- drop "printk: Skip messages on oops" patch v4: -- do not directly wake_up() the printing kthread from vprintk_emit(), need to go via IRQ->wake_up() to avoid sched deadlocks (Jan) v3: -- use a dedicated kthread for printing instead of using wq (Jan, Tetsuo, Tejun) v2: - use dedicated printk workqueue with WQ_MEM_RECLAIM bit - fallback to system-wide workqueue only if allocation of printk_wq has failed - do not use system_wq as a fallback wq. both console_lock() and onsole_unlock() can spend a significant amount of time; so we need to use system_long_wq. - rework sync/!sync detection logic a) we can have deferred (in_sched) messages before we allocate printk_wq, so the only way to handle those messages is via IRQ context b) even in printk.synchronous mode, deferred messages must not be printed directly, and should go via IRQ context c) even if we allocated printk_wq and have !sync_printk mode, we must route deferred messages via IRQ context - so this adds additional bool flags to vprint_emit() and introduces a new pending bit to `printk_pending' - fix build on !PRINTK configs Jan Kara (2): printk: Make printk() completely async printk: Make wake_up_klogd_work_func() async Documentation/kernel-parameters.txt | 10 ++++ kernel/printk/printk.c | 100 ++++++++++++++++++++++++++++++++---- 2 files changed, 100 insertions(+), 10 deletions(-) -- 2.8.0 ^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH v10 1/2] printk: Make printk() completely async 2016-04-04 16:57 [PATCH v10 0/2] printk: Make printk() completely async Sergey Senozhatsky @ 2016-04-04 16:57 ` Sergey Senozhatsky 2016-04-04 22:51 ` Andrew Morton 2016-04-04 16:57 ` [PATCH v10 2/2] printk: Make wake_up_klogd_work_func() async Sergey Senozhatsky 1 sibling, 1 reply; 39+ messages in thread From: Sergey Senozhatsky @ 2016-04-04 16:57 UTC (permalink / raw) To: Andrew Morton Cc: Jan Kara, Petr Mladek, Tejun Heo, Tetsuo Handa, linux-kernel, Byungchul Park, Sergey Senozhatsky, Sergey Senozhatsky, Jan Kara From: Jan Kara <jack@suse.cz> Currently, printk() sometimes waits for message to be printed to console and sometimes it does not (when console_sem is held by some other process). In case printk() grabs console_sem and starts printing to console, it prints messages from kernel printk buffer until the buffer is empty. When serial console is attached, printing is slow and thus other CPUs in the system have plenty of time to append new messages to the buffer while one CPU is printing. Thus the CPU can spend unbounded amount of time doing printing in console_unlock(). This is especially serious problem if the printk() calling console_unlock() was called with interrupts disabled. In practice users have observed a CPU can spend tens of seconds printing in console_unlock() (usually during boot when hundreds of SCSI devices are discovered) resulting in RCU stalls (CPU doing printing doesn't reach quiescent state for a long time), softlockup reports (IPIs for the printing CPU don't get served and thus other CPUs are spinning waiting for the printing CPU to process IPIs), and eventually a machine death (as messages from stalls and lockups append to printk buffer faster than we are able to print). So these machines are unable to boot with serial console attached. Another observed issue is that due to slow printk, hardware discovery is slow and udev times out before kernel manages to discover all the attached HW. Also during artificial stress testing SATA disk disappears from the system because its interrupts aren't served for too long. This patch makes printk() completely asynchronous (similar to what printk_deferred() did until now). It appends message to the kernel printk buffer and wake_up()s a special dedicated kthread to do the printing to console. This has the advantage that printing always happens from a schedulable contex and thus we don't lockup any particular CPU or even interrupts. Also it has the advantage that printk() is fast and thus kernel booting is not slowed down by slow serial console. Disadvantage of this method is that in case of crash there is higher chance that important messages won't appear in console output (we may need working scheduling to print message to console). We somewhat mitigate this risk by switching printk to the original method of immediate printing to console if oops is in progress. Also for debugging purposes we provide printk.synchronous kernel parameter which resorts to the original printk behavior. printk() is expected to work under different conditions and in different scenarios, including corner cases of OOM when all of the workers are busy (e.g. allocating memory), thus printk() uses its own dedicated printing kthread, rather than relying on workqueue (even with WQ_MEM_RECLAIM bit set we potentially can receive delays in printing until workqueue declares a ->mayday, as noted by Tetsuo Handa). Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> --- Documentation/kernel-parameters.txt | 10 +++++ kernel/printk/printk.c | 87 ++++++++++++++++++++++++++++++++++--- 2 files changed, 90 insertions(+), 7 deletions(-) diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 3d28b50..c23a5bd 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -3122,6 +3122,16 @@ bytes respectively. Such letter suffixes can also be entirely omitted. printk.time= Show timing data prefixed to each printk message line Format: <bool> (1/Y/y=enable, 0/N/n=disable) + printk.synchronous= + By default kernel messages are printed to console + asynchronously (except during early boot or when oops + is happening). That avoids kernel stalling behind slow + serial console and thus avoids softlockups, interrupt + timeouts, or userspace timing out during heavy printing. + However for debugging problems, printing messages to + console immediately may be desirable. This option + enables such behavior. + processor.max_cstate= [HW,ACPI] Limit processor to maximum C-state max_cstate=9 overrides any DMI blacklist limit. diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index bfbf284..f63dfe4 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -46,6 +46,7 @@ #include <linux/utsname.h> #include <linux/ctype.h> #include <linux/uio.h> +#include <linux/kthread.h> #include <asm/uaccess.h> #include <asm-generic/sections.h> @@ -284,6 +285,19 @@ static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN); static char *log_buf = __log_buf; static u32 log_buf_len = __LOG_BUF_LEN; +/* + * When true, printing to console will happen synchronously. + * The default value on UP systems is 'true'. + */ +static bool __read_mostly printk_sync = !IS_ENABLED(CONFIG_SMP); +module_param_named(synchronous, printk_sync, bool, S_IRUGO); +MODULE_PARM_DESC(synchronous, "make printing to console synchronous"); + +/* Printing kthread for async printk */ +static struct task_struct *printk_kthread; +/* When `true' - printing thread has messages to print */ +static bool printk_kthread_need_flush_console; + /* Return log buffer address */ char *log_buf_addr_get(void) { @@ -1608,6 +1622,8 @@ asmlinkage int vprintk_emit(int facility, int level, const char *dict, size_t dictlen, const char *fmt, va_list args) { + /* cpu currently holding logbuf_lock in this function */ + static unsigned int logbuf_cpu = UINT_MAX; static bool recursion_bug; static char textbuf[LOG_LINE_MAX]; char *text = textbuf; @@ -1617,8 +1633,7 @@ asmlinkage int vprintk_emit(int facility, int level, int this_cpu; int printed_len = 0; bool in_sched = false; - /* cpu currently holding logbuf_lock in this function */ - static unsigned int logbuf_cpu = UINT_MAX; + bool in_panic = console_loglevel == CONSOLE_LOGLEVEL_MOTORMOUTH; if (level == LOGLEVEL_SCHED) { level = LOGLEVEL_DEFAULT; @@ -1757,12 +1772,29 @@ asmlinkage int vprintk_emit(int facility, int level, if (!in_sched) { lockdep_off(); /* - * Try to acquire and then immediately release the console - * semaphore. The release will print out buffers and wake up - * /dev/kmsg and syslog() users. + * By default we print message to console asynchronously so + * that kernel doesn't get stalled due to slow serial console. + * That can lead to softlockups, lost interrupts, or userspace + * timing out under heavy printing load. + * + * However we resort to synchronous printing of messages during + * early boot, when synchronous printing was explicitly + * requested by kernel parameter, or when console_verbose() was + * called to print everything during panic / oops. */ - if (console_trylock()) - console_unlock(); + if (!in_panic && printk_kthread) { + /* Offload printing to a schedulable context. */ + printk_kthread_need_flush_console = true; + wake_up_process(printk_kthread); + } else { + /* + * Try to acquire and then immediately release the + * console semaphore. The release will print out + * buffers and wake up /dev/kmsg and syslog() users. + */ + if (console_trylock()) + console_unlock(); + } lockdep_on(); } @@ -2722,6 +2754,47 @@ static int __init printk_late_init(void) late_initcall(printk_late_init); #if defined CONFIG_PRINTK +static int printk_kthread_func(void *data) +{ + while (1) { + set_current_state(TASK_INTERRUPTIBLE); + if (!printk_kthread_need_flush_console) + schedule(); + + __set_current_state(TASK_RUNNING); + /* + * Avoid an infinite loop when console_unlock() cannot + * access consoles, e.g. because console_suspended is + * true. schedule(), someone else will print the messages + * from resume_console(). + */ + printk_kthread_need_flush_console = false; + + console_lock(); + console_unlock(); + } + + return 0; +} + +static int __init init_printk_kthread(void) +{ + struct task_struct *thread; + + if (printk_sync) + return 0; + + thread = kthread_run(printk_kthread_func, NULL, "printk"); + if (IS_ERR(thread)) { + pr_err("printk: unable to create printing thread\n"); + printk_sync = true; + } else { + printk_kthread = thread; + } + return 0; +} +late_initcall(init_printk_kthread); + /* * Delayed printk version, for scheduler-internal messages: */ -- 2.8.0 ^ permalink raw reply related [flat|nested] 39+ messages in thread
* Re: [PATCH v10 1/2] printk: Make printk() completely async 2016-04-04 16:57 ` [PATCH v10 1/2] " Sergey Senozhatsky @ 2016-04-04 22:51 ` Andrew Morton 2016-04-05 5:17 ` Sergey Senozhatsky 2016-04-06 8:27 ` Jan Kara 0 siblings, 2 replies; 39+ messages in thread From: Andrew Morton @ 2016-04-04 22:51 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Jan Kara, Petr Mladek, Tejun Heo, Tetsuo Handa, linux-kernel, Byungchul Park, Sergey Senozhatsky, Jan Kara On Tue, 5 Apr 2016 01:57:27 +0900 Sergey Senozhatsky <sergey.senozhatsky@gmail.com> wrote: > From: Jan Kara <jack@suse.cz> > > Currently, printk() sometimes waits for message to be printed to console > and sometimes it does not (when console_sem is held by some other > process). In case printk() grabs console_sem and starts printing to > console, it prints messages from kernel printk buffer until the buffer > is empty. When serial console is attached, printing is slow and thus > other CPUs in the system have plenty of time to append new messages to > the buffer while one CPU is printing. Thus the CPU can spend unbounded > amount of time doing printing in console_unlock(). This is especially > serious problem if the printk() calling console_unlock() was called with > interrupts disabled. > > In practice users have observed a CPU can spend tens of seconds printing > in console_unlock() (usually during boot when hundreds of SCSI devices > are discovered) resulting in RCU stalls (CPU doing printing doesn't > reach quiescent state for a long time), softlockup reports (IPIs for the > printing CPU don't get served and thus other CPUs are spinning waiting > for the printing CPU to process IPIs), and eventually a machine death > (as messages from stalls and lockups append to printk buffer faster than > we are able to print). So these machines are unable to boot with serial > console attached. Another observed issue is that due to slow printk, > hardware discovery is slow and udev times out before kernel manages to > discover all the attached HW. Also during artificial stress testing SATA > disk disappears from the system because its interrupts aren't served for > too long. > > This patch makes printk() completely asynchronous (similar to what > printk_deferred() did until now). It appends message to the kernel > printk buffer and wake_up()s a special dedicated kthread to do the > printing to console. This has the advantage that printing always happens > from a schedulable contex and thus we don't lockup any particular CPU or > even interrupts. Also it has the advantage that printk() is fast and > thus kernel booting is not slowed down by slow serial console. > Disadvantage of this method is that in case of crash there is higher > chance that important messages won't appear in console output (we may > need working scheduling to print message to console). We somewhat > mitigate this risk by switching printk to the original method of > immediate printing to console if oops is in progress. Also for > debugging purposes we provide printk.synchronous kernel parameter which > resorts to the original printk behavior. > > printk() is expected to work under different conditions and in different > scenarios, including corner cases of OOM when all of the workers are busy > (e.g. allocating memory), thus printk() uses its own dedicated printing > kthread, rather than relying on workqueue (even with WQ_MEM_RECLAIM bit > set we potentially can receive delays in printing until workqueue > declares a ->mayday, as noted by Tetsuo Handa). The whole idea remains worrisome. It is definitely making printk() less reliable in the vast majority of cases: what happens if the scheduler is busted or random memory has been scribbled on, etc. All this downside to handle (afaict) one special case. Surely there is another way? For example (but feel free to suggest other approaches!) can we put some limit on the number of extra characters which the printing task will print? Once that limit is hit, new printk callers will spin until they can get in and do some printing themselves. Or something else? > index 3d28b50..c23a5bd 100644 > --- a/Documentation/kernel-parameters.txt > +++ b/Documentation/kernel-parameters.txt > @@ -3122,6 +3122,16 @@ bytes respectively. Such letter suffixes can also be entirely omitted. > printk.time= Show timing data prefixed to each printk message line > Format: <bool> (1/Y/y=enable, 0/N/n=disable) > > + printk.synchronous= > + By default kernel messages are printed to console > + asynchronously (except during early boot or when oops > + is happening). That avoids kernel stalling behind slow > + serial console and thus avoids softlockups, interrupt > + timeouts, or userspace timing out during heavy printing. > + However for debugging problems, printing messages to > + console immediately may be desirable. This option > + enables such behavior. Well, it's good that we have this. It would be better if it was runtime-controllable - changing boot parameters is a bit of a pain. In fact with this approach, your zillions-of-scsi-disks scenario becomes less problematic: do the async offloading during the boot process then switch back to the more reliable sync printing late in boot. > --- a/kernel/printk/printk.c > +++ b/kernel/printk/printk.c > @@ -46,6 +46,7 @@ > #include <linux/utsname.h> > #include <linux/ctype.h> > #include <linux/uio.h> > +#include <linux/kthread.h> > > #include <asm/uaccess.h> > #include <asm-generic/sections.h> > @@ -284,6 +285,19 @@ static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN); > static char *log_buf = __log_buf; > static u32 log_buf_len = __LOG_BUF_LEN; > > +/* > + * When true, printing to console will happen synchronously. > + * The default value on UP systems is 'true'. That's rather obvious from the code. Comments should explain "why", not "what". > + */ > +static bool __read_mostly printk_sync = !IS_ENABLED(CONFIG_SMP); > +module_param_named(synchronous, printk_sync, bool, S_IRUGO); > +MODULE_PARM_DESC(synchronous, "make printing to console synchronous"); > + > +/* Printing kthread for async printk */ > +static struct task_struct *printk_kthread; > +/* When `true' - printing thread has messages to print */ > +static bool printk_kthread_need_flush_console; > + > /* Return log buffer address */ > char *log_buf_addr_get(void) > { > @@ -1608,6 +1622,8 @@ asmlinkage int vprintk_emit(int facility, int level, > const char *dict, size_t dictlen, > const char *fmt, va_list args) > { > + /* cpu currently holding logbuf_lock in this function */ > + static unsigned int logbuf_cpu = UINT_MAX; > static bool recursion_bug; > static char textbuf[LOG_LINE_MAX]; > char *text = textbuf; > @@ -1617,8 +1633,7 @@ asmlinkage int vprintk_emit(int facility, int level, > int this_cpu; > int printed_len = 0; > bool in_sched = false; > - /* cpu currently holding logbuf_lock in this function */ > - static unsigned int logbuf_cpu = UINT_MAX; > + bool in_panic = console_loglevel == CONSOLE_LOGLEVEL_MOTORMOUTH; > > if (level == LOGLEVEL_SCHED) { > level = LOGLEVEL_DEFAULT; > @@ -1757,12 +1772,29 @@ asmlinkage int vprintk_emit(int facility, int level, > if (!in_sched) { > lockdep_off(); > /* > - * Try to acquire and then immediately release the console > - * semaphore. The release will print out buffers and wake up > - * /dev/kmsg and syslog() users. > + * By default we print message to console asynchronously so Nit: this comment down here shouldn't know what the default is. That should be documented up at the printk_sync definition site. > + * that kernel doesn't get stalled due to slow serial console. s/kernel/the kernel/ > + * That can lead to softlockups, lost interrupts, or userspace > + * timing out under heavy printing load. > + * > + * However we resort to synchronous printing of messages during > + * early boot, when synchronous printing was explicitly > + * requested by kernel parameter, or when console_verbose() was s/kernel/a kernel/ > + * called to print everything during panic / oops. We're missing a description of *why* console_verbose() is handled specially. > */ > - if (console_trylock()) > - console_unlock(); > + if (!in_panic && printk_kthread) { We don't really need local variable in_panic. I guess it has some documentary value. > + /* Offload printing to a schedulable context. */ > + printk_kthread_need_flush_console = true; > + wake_up_process(printk_kthread); > + } else { > + /* > + * Try to acquire and then immediately release the > + * console semaphore. The release will print out > + * buffers and wake up /dev/kmsg and syslog() users. > + */ > + if (console_trylock()) > + console_unlock(); > + } > lockdep_on(); > } > > @@ -2722,6 +2754,47 @@ static int __init printk_late_init(void) > late_initcall(printk_late_init); > > #if defined CONFIG_PRINTK > +static int printk_kthread_func(void *data) > +{ > + while (1) { > + set_current_state(TASK_INTERRUPTIBLE); > + if (!printk_kthread_need_flush_console) > + schedule(); > + > + __set_current_state(TASK_RUNNING); > + /* > + * Avoid an infinite loop when console_unlock() cannot > + * access consoles, e.g. because console_suspended is > + * true. schedule(), someone else will print the messages > + * from resume_console(). > + */ > + printk_kthread_need_flush_console = false; > + > + console_lock(); > + console_unlock(); > + } > + > + return 0; > +} > + > +static int __init init_printk_kthread(void) > +{ > + struct task_struct *thread; > + > + if (printk_sync) > + return 0; > + > + thread = kthread_run(printk_kthread_func, NULL, "printk"); This gets normal scheduling policy, so a spinning userspace SCHED_FIFO task will block printk for ever. This seems bad. > + if (IS_ERR(thread)) { > + pr_err("printk: unable to create printing thread\n"); > + printk_sync = true; > + } else { > + printk_kthread = thread; > + } > + return 0; > +} > +late_initcall(init_printk_kthread); Could do with a comment explaining why late_initcall was chosen. ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v10 1/2] printk: Make printk() completely async 2016-04-04 22:51 ` Andrew Morton @ 2016-04-05 5:17 ` Sergey Senozhatsky 2016-04-05 7:39 ` Sergey Senozhatsky 2016-04-06 8:27 ` Jan Kara 1 sibling, 1 reply; 39+ messages in thread From: Sergey Senozhatsky @ 2016-04-05 5:17 UTC (permalink / raw) To: Andrew Morton Cc: Sergey Senozhatsky, Jan Kara, Petr Mladek, Tejun Heo, Tetsuo Handa, linux-kernel, Byungchul Park, Sergey Senozhatsky, Jan Kara Hello Andrew, On (04/04/16 15:51), Andrew Morton wrote: [..] > The whole idea remains worrisome. It is definitely making printk() > less reliable in the vast majority of cases: what happens if the > scheduler is busted or random memory has been scribbled on, etc. yes. well, printk, in some sense, already depend on the scheduler: console semaphore on its own; cond_resched() from console_unlock() with console_sem being locked, etc. neither memory corruption is something that printk() can always handle nicely. I saw logbuf_lock corruption and recursive spin_dump from vprintk_emit(), was quite dramatic. > All this downside to handle (afaict) one special case. well, it's not just to make zillions-of-scsi-disks happy. I'm facing different scenarios, more general ones, a 'moderate printk abuse' (combined with slow serial console). printk is not always friendly and shiny, it has some secrets and it can bite easily (lockups, stalls, starvations, sched throttling, et cetera), and this is not something that every developer know. > Surely there is another way? For example (but feel free to suggest other > approaches!) can we put some limit on the number of extra characters which > the printing task will print? Once that limit is hit, new printk callers > will spin until they can get in and do some printing themselves. Or > something else? hm... there are not so many options, I think. this busy wait, depending on the number of CPUs (and some other factors), can provoke mass softlockups on other CPUs and a number of bad things. printk() and its deferred version can be called from any context, so in some cases spinning in printk is as good as doing console_unlock()->call_concosle_drivers() loop (iow, not really good). things are not so bad (well, Tetsuo has some issues here though) if printk() is called from non-atomic context, since now we have cond_resched() in console_unlock() for console_lock()/console_trylock() callers; but printk() from atomic context is still a problem -- we need to offload the actual printing, unless we can guarantee that every atomic printk will be followed by non-atomic printk (which will do the printing). > > + printk.synchronous= .. > Well, it's good that we have this. > > It would be better if it was runtime-controllable - changing boot > parameters is a bit of a pain. In fact with this approach, your > zillions-of-scsi-disks scenario becomes less problematic: do the async > offloading during the boot process then switch back to the more > reliable sync printing late in boot. well, I can add it if you insist. my personal opinion is to keep it RO; RO->RW transition is easier than RW->RO. giving the control over printk behaviour to user space can potentially be even worse than drop_caches. besides I couldn't clearly understand based on what observations user space may decide to switch printk back to sync mode? and what may cause user space to switch printk back from sync to async... lockups in dmesg output? any hint? .. > > + * When true, printing to console will happen synchronously. > > + * The default value on UP systems is 'true'. > > That's rather obvious from the code. Comments should explain "why", > not "what". fair enough. > > + * By default we print message to console asynchronously so > > Nit: this comment down here shouldn't know what the default is. That > should be documented up at the printk_sync definition site. ok. > > + * that kernel doesn't get stalled due to slow serial console. > > s/kernel/the kernel/ ok. > > + * requested by kernel parameter, or when console_verbose() was > > s/kernel/a kernel/ ok. > > > + * called to print everything during panic / oops. > > We're missing a description of *why* console_verbose() is handled > specially. ok. > > - if (console_trylock()) > > - console_unlock(); > > + if (!in_panic && printk_kthread) { > > We don't really need local variable in_panic. I guess it has some > documentary value. just a shorter version of "console_loglevel == CONSOLE_LOGLEVEL_MOTORMOUTH". > > + > > + thread = kthread_run(printk_kthread_func, NULL, "printk"); > > This gets normal scheduling policy, so a spinning userspace SCHED_FIFO > task will block printk for ever. This seems bad. yes, using SCHED_RR/FIFO policy here makes sense. > > +late_initcall(init_printk_kthread); > > Could do with a comment explaining why late_initcall was chosen. late_initcall was chosen because of workqueue early_initcall, and I just decided not to change it when I switched from wq to a dedicated printk kthread. late_initcall seemed to be OK. can do init_printk_kthread() somewhere in init/main start_kernel(). -ss ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v10 1/2] printk: Make printk() completely async 2016-04-05 5:17 ` Sergey Senozhatsky @ 2016-04-05 7:39 ` Sergey Senozhatsky 2016-04-06 0:19 ` Sergey Senozhatsky 0 siblings, 1 reply; 39+ messages in thread From: Sergey Senozhatsky @ 2016-04-05 7:39 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Andrew Morton, Sergey Senozhatsky, Jan Kara, Petr Mladek, Tejun Heo, Tetsuo Handa, linux-kernel, Byungchul Park, Jan Kara On (04/05/16 14:17), Sergey Senozhatsky wrote: [..] > > Could do with a comment explaining why late_initcall was chosen. > > late_initcall was chosen because of workqueue early_initcall, and > I just decided not to change it when I switched from wq to a > dedicated printk kthread. late_initcall seemed to be OK. can do > init_printk_kthread() somewhere in init/main start_kernel(). or rather move it a bit earlier. core_init sounds appropriate; or postcore. don't want to export yet another printk symbol. -ss ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v10 1/2] printk: Make printk() completely async 2016-04-05 7:39 ` Sergey Senozhatsky @ 2016-04-06 0:19 ` Sergey Senozhatsky 0 siblings, 0 replies; 39+ messages in thread From: Sergey Senozhatsky @ 2016-04-06 0:19 UTC (permalink / raw) To: Andrew Morton Cc: Sergey Senozhatsky, Sergey Senozhatsky, Jan Kara, Petr Mladek, Tejun Heo, Tetsuo Handa, linux-kernel, Byungchul Park, Jan Kara Hello, so I currently have something like this. untouched things: - module param is RO; offload printing to kthread (I'll wait for opinions/replies). ======= >From 1e7de1f9a590d8f23609f943362768e4c14580cc Mon Sep 17 00:00:00 2001 From: Jan Kara <jack@suse.cz> Date: Tue, 5 Apr 2016 01:57:27 +0900 Subject: [PATCH] printk: Make printk() completely async Currently, printk() sometimes waits for message to be printed to console and sometimes it does not (when console_sem is held by some other process). In case printk() grabs console_sem and starts printing to console, it prints messages from kernel printk buffer until the buffer is empty. When serial console is attached, printing is slow and thus other CPUs in the system have plenty of time to append new messages to the buffer while one CPU is printing. Thus the CPU can spend unbounded amount of time doing printing in console_unlock(). This is especially serious problem if the printk() calling console_unlock() was called with interrupts disabled. In practice users have observed a CPU can spend tens of seconds printing in console_unlock() (usually during boot when hundreds of SCSI devices are discovered) resulting in RCU stalls (CPU doing printing doesn't reach quiescent state for a long time), softlockup reports (IPIs for the printing CPU don't get served and thus other CPUs are spinning waiting for the printing CPU to process IPIs), and eventually a machine death (as messages from stalls and lockups append to printk buffer faster than we are able to print). So these machines are unable to boot with serial console attached. Another observed issue is that due to slow printk, hardware discovery is slow and udev times out before kernel manages to discover all the attached HW. Also during artificial stress testing SATA disk disappears from the system because its interrupts aren't served for too long. This patch makes printk() completely asynchronous (similar to what printk_deferred() did until now). It appends message to the kernel printk buffer and wake_up()s a special dedicated kthread to do the printing to console. This has the advantage that printing always happens from a schedulable contex and thus we don't lockup any particular CPU or even interrupts. Also it has the advantage that printk() is fast and thus kernel booting is not slowed down by slow serial console. Disadvantage of this method is that in case of crash there is higher chance that important messages won't appear in console output (we may need working scheduling to print message to console). We somewhat mitigate this risk by switching printk to the original method of immediate printing to console if oops is in progress. Also for debugging purposes we provide printk.synchronous kernel parameter which resorts to the original printk behavior. printk() is expected to work under different conditions and in different scenarios, including corner cases of OOM when all of the workers are busy (e.g. allocating memory), thus printk() uses its own dedicated printing kthread, rather than relying on workqueue (even with WQ_MEM_RECLAIM bit set we potentially can receive delays in printing until workqueue declares a ->mayday, as noted by Tetsuo Handa). Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> --- Documentation/kernel-parameters.txt | 10 ++++ kernel/printk/printk.c | 94 ++++++++++++++++++++++++++++++++++--- 2 files changed, 97 insertions(+), 7 deletions(-) diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 46c0b19..733a13d 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -3117,6 +3117,16 @@ bytes respectively. Such letter suffixes can also be entirely omitted. printk.time= Show timing data prefixed to each printk message line Format: <bool> (1/Y/y=enable, 0/N/n=disable) + printk.synchronous= + By default kernel messages are printed to console + asynchronously (except during early boot or when oops + is happening). That avoids kernel stalling behind slow + serial console and thus avoids softlockups, interrupt + timeouts, or userspace timing out during heavy printing. + However for debugging problems, printing messages to + console immediately may be desirable. This option + enables such behavior. + processor.max_cstate= [HW,ACPI] Limit processor to maximum C-state max_cstate=9 overrides any DMI blacklist limit. diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index bfbf284..217f27a 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -46,6 +46,7 @@ #include <linux/utsname.h> #include <linux/ctype.h> #include <linux/uio.h> +#include <linux/kthread.h> #include <asm/uaccess.h> #include <asm-generic/sections.h> @@ -284,6 +285,16 @@ static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN); static char *log_buf = __log_buf; static u32 log_buf_len = __LOG_BUF_LEN; +/* Control whether printing to console must be synchronous. */ +static bool __read_mostly printk_sync = !IS_ENABLED(CONFIG_SMP); +module_param_named(synchronous, printk_sync, bool, S_IRUGO); +MODULE_PARM_DESC(synchronous, "make printing to console synchronous"); + +/* Printing kthread for async printk */ +static struct task_struct *printk_kthread; +/* When `true' printing thread has messages to print */ +static bool printk_kthread_need_flush_console; + /* Return log buffer address */ char *log_buf_addr_get(void) { @@ -1608,6 +1619,8 @@ asmlinkage int vprintk_emit(int facility, int level, const char *dict, size_t dictlen, const char *fmt, va_list args) { + /* cpu currently holding logbuf_lock in this function */ + static unsigned int logbuf_cpu = UINT_MAX; static bool recursion_bug; static char textbuf[LOG_LINE_MAX]; char *text = textbuf; @@ -1617,8 +1630,6 @@ asmlinkage int vprintk_emit(int facility, int level, int this_cpu; int printed_len = 0; bool in_sched = false; - /* cpu currently holding logbuf_lock in this function */ - static unsigned int logbuf_cpu = UINT_MAX; if (level == LOGLEVEL_SCHED) { level = LOGLEVEL_DEFAULT; @@ -1757,12 +1768,35 @@ asmlinkage int vprintk_emit(int facility, int level, if (!in_sched) { lockdep_off(); /* - * Try to acquire and then immediately release the console - * semaphore. The release will print out buffers and wake up - * /dev/kmsg and syslog() users. + * Attempt to print the messages to console asynchronously so + * that the kernel doesn't get stalled due to slow serial + * console. That can lead to softlockups, lost interrupts, or + * userspace timing out under heavy printing load. + * + * However we resort to synchronous printing of messages during + * early boot, when synchronous printing was explicitly + * requested by a kernel parameter, or when console_verbose() + * was called to print everything during panic / oops. + * Unlike bust_spinlocks() and oops_in_progress, + * console_verbose() sets console_loglevel to MOTORMOUTH and + * never clears it, while oops_in_progress can go back to 0, + * switching printk back to async mode; we want printk to + * operate in sync mode once panic() occurred. */ - if (console_trylock()) - console_unlock(); + if (console_loglevel != CONSOLE_LOGLEVEL_MOTORMOUTH && + printk_kthread) { + /* Offload printing to a schedulable context. */ + printk_kthread_need_flush_console = true; + wake_up_process(printk_kthread); + } else { + /* + * Try to acquire and then immediately release the + * console semaphore. The release will print out + * buffers and wake up /dev/kmsg and syslog() users. + */ + if (console_trylock()) + console_unlock(); + } lockdep_on(); } @@ -2722,6 +2756,52 @@ static int __init printk_late_init(void) late_initcall(printk_late_init); #if defined CONFIG_PRINTK +static int printk_kthread_func(void *data) +{ + while (1) { + set_current_state(TASK_INTERRUPTIBLE); + if (!printk_kthread_need_flush_console) + schedule(); + + __set_current_state(TASK_RUNNING); + /* + * Avoid an infinite loop when console_unlock() cannot + * access consoles, e.g. because console_suspended is + * true. schedule(), someone else will print the messages + * from resume_console(). + */ + printk_kthread_need_flush_console = false; + + console_lock(); + console_unlock(); + } + + return 0; +} + +static int __init init_printk_kthread(void) +{ + struct task_struct *thread; + + if (printk_sync) + return 0; + + thread = kthread_run(printk_kthread_func, NULL, "printk"); + if (IS_ERR(thread)) { + pr_err("printk: unable to create printing thread\n"); + printk_sync = true; + } else { + struct sched_param param = { + .sched_priority = MAX_RT_PRIO - 1, + }; + + sched_setscheduler(thread, SCHED_FIFO, ¶m); + printk_kthread = thread; + } + return 0; +} +core_initcall(init_printk_kthread); + /* * Delayed printk version, for scheduler-internal messages: */ -- 2.8.0.rc0.1.gd285ab0 ^ permalink raw reply related [flat|nested] 39+ messages in thread
* Re: [PATCH v10 1/2] printk: Make printk() completely async 2016-04-04 22:51 ` Andrew Morton 2016-04-05 5:17 ` Sergey Senozhatsky @ 2016-04-06 8:27 ` Jan Kara 2016-04-07 9:48 ` Sergey Senozhatsky 2016-08-10 21:17 ` Viresh Kumar 1 sibling, 2 replies; 39+ messages in thread From: Jan Kara @ 2016-04-06 8:27 UTC (permalink / raw) To: Andrew Morton Cc: Sergey Senozhatsky, Jan Kara, Petr Mladek, Tejun Heo, Tetsuo Handa, linux-kernel, Byungchul Park, Sergey Senozhatsky, Jan Kara On Mon 04-04-16 15:51:49, Andrew Morton wrote: > On Tue, 5 Apr 2016 01:57:27 +0900 Sergey Senozhatsky <sergey.senozhatsky@gmail.com> wrote: > > > From: Jan Kara <jack@suse.cz> > > > > Currently, printk() sometimes waits for message to be printed to console > > and sometimes it does not (when console_sem is held by some other > > process). In case printk() grabs console_sem and starts printing to > > console, it prints messages from kernel printk buffer until the buffer > > is empty. When serial console is attached, printing is slow and thus > > other CPUs in the system have plenty of time to append new messages to > > the buffer while one CPU is printing. Thus the CPU can spend unbounded > > amount of time doing printing in console_unlock(). This is especially > > serious problem if the printk() calling console_unlock() was called with > > interrupts disabled. > > > > In practice users have observed a CPU can spend tens of seconds printing > > in console_unlock() (usually during boot when hundreds of SCSI devices > > are discovered) resulting in RCU stalls (CPU doing printing doesn't > > reach quiescent state for a long time), softlockup reports (IPIs for the > > printing CPU don't get served and thus other CPUs are spinning waiting > > for the printing CPU to process IPIs), and eventually a machine death > > (as messages from stalls and lockups append to printk buffer faster than > > we are able to print). So these machines are unable to boot with serial > > console attached. Another observed issue is that due to slow printk, > > hardware discovery is slow and udev times out before kernel manages to > > discover all the attached HW. Also during artificial stress testing SATA > > disk disappears from the system because its interrupts aren't served for > > too long. > > > > This patch makes printk() completely asynchronous (similar to what > > printk_deferred() did until now). It appends message to the kernel > > printk buffer and wake_up()s a special dedicated kthread to do the > > printing to console. This has the advantage that printing always happens > > from a schedulable contex and thus we don't lockup any particular CPU or > > even interrupts. Also it has the advantage that printk() is fast and > > thus kernel booting is not slowed down by slow serial console. > > Disadvantage of this method is that in case of crash there is higher > > chance that important messages won't appear in console output (we may > > need working scheduling to print message to console). We somewhat > > mitigate this risk by switching printk to the original method of > > immediate printing to console if oops is in progress. Also for > > debugging purposes we provide printk.synchronous kernel parameter which > > resorts to the original printk behavior. > > > > printk() is expected to work under different conditions and in different > > scenarios, including corner cases of OOM when all of the workers are busy > > (e.g. allocating memory), thus printk() uses its own dedicated printing > > kthread, rather than relying on workqueue (even with WQ_MEM_RECLAIM bit > > set we potentially can receive delays in printing until workqueue > > declares a ->mayday, as noted by Tetsuo Handa). > > The whole idea remains worrisome. It is definitely making printk() > less reliable in the vast majority of cases: what happens if the > scheduler is busted or random memory has been scribbled on, etc. So the scheduler dependency was always there due to handling of console_sem and some console drivers possibly scheduling out while holding console_sem. Recently the dependency has increased by adding cond_resched() into console_unlock() loop in 8d91f8b15361 (printk: do cond_resched() between lines while outputting to consoles). Also the changelog of that commit explains another case besides 'tons of SCSI devices' which leads to printk lockups. I have also seen a case where printing of 'martian packets' killed the machine because it was too intensive (but it has been an older kernel and I think networking people have added some ratelimiting for those messages so it shouldn't happen anymore). Now this commit makes the dependency on the scheduler even harder - unless we know something is wrong (in_panic set) we rely on scheduler to schedule the printing kthread. So I agree there is some risk in this patch. > All this downside to handle (afaict) one special case. Surely there is > another way? For example (but feel free to suggest other approaches!) > can we put some limit on the number of extra characters which the > printing task will print? Once that limit is hit, new printk callers > will spin until they can get in and do some printing themselves. Or > something else? We have been through this a few times without much success. Last revision of my patches did what you suggest but although the idea sounds simple, the implementation turned out to be relatively complex and even you didn't quite like it [1]. Moreover when I raised this issue at Kernel Summit 2015, Hannes Reinecke mentioned that even these patches are not enough to make his huge machine boot with serial console because printing still delays device probing too much and systemd times out during boot. Then Linus said that he'd want to keep things simple and just make printk asynchronous and switch to synchronous in case problems are detected - which is what I and Sergey have implemented in this patch. So currently I'm of the opinion that we should merge this patch and see how well it works in reality. I've tested it and it works for me and Sergey. Maybe I can try breaking it some more by crashing the in various ways kernel if it makes you calmer. In case we see problems in some setups, there's always the kernel option to return to the original behavior and we can work from there improving what we have... [1] Message-Id: <20150918151411.a3fa65c3e4f33f9f2ddf1fd8@linux-foundation.org> > > index 3d28b50..c23a5bd 100644 > > --- a/Documentation/kernel-parameters.txt > > +++ b/Documentation/kernel-parameters.txt > > @@ -3122,6 +3122,16 @@ bytes respectively. Such letter suffixes can also be entirely omitted. > > printk.time= Show timing data prefixed to each printk message line > > Format: <bool> (1/Y/y=enable, 0/N/n=disable) > > > > + printk.synchronous= > > + By default kernel messages are printed to console > > + asynchronously (except during early boot or when oops > > + is happening). That avoids kernel stalling behind slow > > + serial console and thus avoids softlockups, interrupt > > + timeouts, or userspace timing out during heavy printing. > > + However for debugging problems, printing messages to > > + console immediately may be desirable. This option > > + enables such behavior. > > Well, it's good that we have this. > > It would be better if it was runtime-controllable - changing boot > parameters is a bit of a pain. In fact with this approach, your > zillions-of-scsi-disks scenario becomes less problematic: do the async > offloading during the boot process then switch back to the more > reliable sync printing late in boot. Doing this should be relatively easy. It would be userspace's decision whether they want more reliable or faster printk. Sounds fine with me. > > +static int __init init_printk_kthread(void) > > +{ > > + struct task_struct *thread; > > + > > + if (printk_sync) > > + return 0; > > + > > + thread = kthread_run(printk_kthread_func, NULL, "printk"); > > This gets normal scheduling policy, so a spinning userspace SCHED_FIFO > task will block printk for ever. This seems bad. I have to research this a bit but won't the SCHED_FIFO task that has potentially unbounded amount of work lockup the CPU even though it does occasional cond_resched()? Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v10 1/2] printk: Make printk() completely async 2016-04-06 8:27 ` Jan Kara @ 2016-04-07 9:48 ` Sergey Senozhatsky 2016-04-07 12:08 ` Sergey Senozhatsky 2016-08-10 21:17 ` Viresh Kumar 1 sibling, 1 reply; 39+ messages in thread From: Sergey Senozhatsky @ 2016-04-07 9:48 UTC (permalink / raw) To: Jan Kara Cc: Andrew Morton, Sergey Senozhatsky, Jan Kara, Petr Mladek, Tejun Heo, Tetsuo Handa, linux-kernel, Byungchul Park, Sergey Senozhatsky Hello, On (04/06/16 10:27), Jan Kara wrote: [..] > > Well, it's good that we have this. > > > > It would be better if it was runtime-controllable - changing boot > > parameters is a bit of a pain. In fact with this approach, your > > zillions-of-scsi-disks scenario becomes less problematic: do the async > > offloading during the boot process then switch back to the more > > reliable sync printing late in boot. > > Doing this should be relatively easy. It would be userspace's decision > whether they want more reliable or faster printk. Sounds fine with me. I can add it as a separate patch to the series. should be quite trivial. I have [minor] concerns, though. I can see how, for example, user space can decide what logging level it wants '1 4 4 7' or anything else, but how can user space decide what printk implementation it wants to use? I'm more or less positive not to back-port that `synchronous RW' patch to the kernels that I use; just because I don't want to give this freedom to people, sync printk is something I'm trying to run away from. > > This gets normal scheduling policy, so a spinning userspace SCHED_FIFO > > task will block printk for ever. This seems bad. > > I have to research this a bit but won't the SCHED_FIFO task that has > potentially unbounded amount of work lockup the CPU even though it does > occasional cond_resched()? depending on `watchdog_thresh' value, it can take something like 20+ seconds before watchdog will notice softlockup. so I'm setting printk kthread prio to `MAX_RT_PRIO - 1' as of now, just in case. I think I'll leave printk kthread init as a late_initcall. probably would prefer core/arch/device init calls to happen in sync printk mode. -ss ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v10 1/2] printk: Make printk() completely async 2016-04-07 9:48 ` Sergey Senozhatsky @ 2016-04-07 12:08 ` Sergey Senozhatsky 2016-04-07 13:15 ` Jan Kara 0 siblings, 1 reply; 39+ messages in thread From: Sergey Senozhatsky @ 2016-04-07 12:08 UTC (permalink / raw) To: Jan Kara, Andrew Morton Cc: Sergey Senozhatsky, Sergey Senozhatsky, Jan Kara, Petr Mladek, Tejun Heo, Tetsuo Handa, linux-kernel, Byungchul Park Hello, On (04/07/16 18:48), Sergey Senozhatsky wrote: > On (04/06/16 10:27), Jan Kara wrote: > [..] > > > Well, it's good that we have this. > > > > > > It would be better if it was runtime-controllable - changing boot > > > parameters is a bit of a pain. In fact with this approach, your > > > zillions-of-scsi-disks scenario becomes less problematic: do the async > > > offloading during the boot process then switch back to the more > > > reliable sync printing late in boot. > > > > Doing this should be relatively easy. It would be userspace's decision > > whether they want more reliable or faster printk. Sounds fine with me. ok, after some thinking -- it makes a lot of sense to have it. good old sync printk is potentially more reliable after all. I think I also now want to meke the 'default' for printk_sync being 'true'. so we won't spoil printk on the systems that never had any problems with it. at least for one release cycle, may be. thus, people would need to request async printk via boot param and switch back to sync printk once the booting process is done [or keep async printk]. how does that sound? -ss > > > This gets normal scheduling policy, so a spinning userspace SCHED_FIFO > > > task will block printk for ever. This seems bad. > > > > I have to research this a bit but won't the SCHED_FIFO task that has > > potentially unbounded amount of work lockup the CPU even though it does > > occasional cond_resched()? > > depending on `watchdog_thresh' value, it can take something like 20+ > seconds before watchdog will notice softlockup. > so I'm setting printk kthread prio to `MAX_RT_PRIO - 1' as of now, > just in case. > > I think I'll leave printk kthread init as a late_initcall. probably > would prefer core/arch/device init calls to happen in sync printk mode. ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v10 1/2] printk: Make printk() completely async 2016-04-07 12:08 ` Sergey Senozhatsky @ 2016-04-07 13:15 ` Jan Kara 0 siblings, 0 replies; 39+ messages in thread From: Jan Kara @ 2016-04-07 13:15 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Jan Kara, Andrew Morton, Sergey Senozhatsky, Jan Kara, Petr Mladek, Tejun Heo, Tetsuo Handa, linux-kernel, Byungchul Park On Thu 07-04-16 21:08:26, Sergey Senozhatsky wrote: > Hello, > > On (04/07/16 18:48), Sergey Senozhatsky wrote: > > On (04/06/16 10:27), Jan Kara wrote: > > [..] > > > > Well, it's good that we have this. > > > > > > > > It would be better if it was runtime-controllable - changing boot > > > > parameters is a bit of a pain. In fact with this approach, your > > > > zillions-of-scsi-disks scenario becomes less problematic: do the async > > > > offloading during the boot process then switch back to the more > > > > reliable sync printing late in boot. > > > > > > Doing this should be relatively easy. It would be userspace's decision > > > whether they want more reliable or faster printk. Sounds fine with me. > > ok, after some thinking -- it makes a lot of sense to have it. good old > sync printk is potentially more reliable after all. I think I also now > want to meke the 'default' for printk_sync being 'true'. so we won't > spoil printk on the systems that never had any problems with it. at > least for one release cycle, may be. thus, people would need to request > async printk via boot param and switch back to sync printk once the booting > process is done [or keep async printk]. > > how does that sound? That is fine with me. We can always enable this by default in our distro so that it gets more exposure and big machines are able to boot... Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v10 1/2] printk: Make printk() completely async 2016-04-06 8:27 ` Jan Kara 2016-04-07 9:48 ` Sergey Senozhatsky @ 2016-08-10 21:17 ` Viresh Kumar 2016-08-12 9:44 ` Petr Mladek 1 sibling, 1 reply; 39+ messages in thread From: Viresh Kumar @ 2016-08-10 21:17 UTC (permalink / raw) To: Jan Kara, Andrew Morton, Sergey Senozhatsky Cc: Jan Kara, Petr Mladek, Tejun Heo, Tetsuo Handa, linux-kernel, Byungchul Park, Sergey Senozhatsky, vlevenetz, Greg Kroah-Hartman +Vladi/Greg, On Wed, Apr 6, 2016 at 1:27 AM, Jan Kara <jack@suse.cz> wrote: > On Mon 04-04-16 15:51:49, Andrew Morton wrote: >> > +static int __init init_printk_kthread(void) >> > +{ >> > + struct task_struct *thread; >> > + >> > + if (printk_sync) >> > + return 0; >> > + >> > + thread = kthread_run(printk_kthread_func, NULL, "printk"); >> >> This gets normal scheduling policy, so a spinning userspace SCHED_FIFO >> task will block printk for ever. This seems bad. > > I have to research this a bit but won't the SCHED_FIFO task that has > potentially unbounded amount of work lockup the CPU even though it does > occasional cond_resched()? We are facing complete hogs because of the printk thread being a SCHED_FIFO task and have this patch to fix it up for now. Author: Vladislav Levenetz <vblagoev@mm-sol.com> Date: Wed Aug 10 13:58:00 2016 -0700 SW-7786: printk: Lower the priority of printk thread Flooding the console (with a test module) in a tight loop indefinitely makes android user interface very sluggish. Opening YouTube app and the device hangs and becomes even more unresponsive to the point it completely hangs. The asynchronous printk thread is a SCHED FIFO thread with priority MAX_RT_PRIO - 1. If we create it as a simple thread (i.e. no SCHED FIFO) instead, we observe much better performance using the same printk flood test. We don't even notice any kind of sluggishness during device usage. We can play a YouTube clip smoothly and use the device normally in general. The kernel log looks fine as well, as the flood of messages continue normally. Signed-off-by: Vladislav Levenetz <vblagoev@mm-sol.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> --- kernel/printk/printk.c | 4 ---- 1 file changed, 4 deletions(-) diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index c32872872cb6..ad5b30e5e6d9 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -2856,9 +2856,6 @@ static int printk_kthread_func(void *data) static int __init_printk_kthread(void) { struct task_struct *thread; - struct sched_param param = { - .sched_priority = MAX_RT_PRIO - 1, - }; if (!printk_kthread_can_run || printk_sync || printk_kthread) return 0; @@ -2870,7 +2867,6 @@ static int __init_printk_kthread(void) return PTR_ERR(thread); } - sched_setscheduler(thread, SCHED_FIFO, ¶m); printk_kthread = thread; return 0; } ^ permalink raw reply related [flat|nested] 39+ messages in thread
* Re: [PATCH v10 1/2] printk: Make printk() completely async 2016-08-10 21:17 ` Viresh Kumar @ 2016-08-12 9:44 ` Petr Mladek 2016-08-15 14:26 ` Vladislav Levenetz 2016-08-18 2:27 ` Sergey Senozhatsky 0 siblings, 2 replies; 39+ messages in thread From: Petr Mladek @ 2016-08-12 9:44 UTC (permalink / raw) To: Viresh Kumar Cc: Jan Kara, Andrew Morton, Sergey Senozhatsky, Jan Kara, Tejun Heo, Tetsuo Handa, linux-kernel, Byungchul Park, Sergey Senozhatsky, vlevenetz, Greg Kroah-Hartman On Wed 2016-08-10 14:17:55, Viresh Kumar wrote: > +Vladi/Greg, > > On Wed, Apr 6, 2016 at 1:27 AM, Jan Kara <jack@suse.cz> wrote: > > On Mon 04-04-16 15:51:49, Andrew Morton wrote: > > >> > +static int __init init_printk_kthread(void) > >> > +{ > >> > + struct task_struct *thread; > >> > + > >> > + if (printk_sync) > >> > + return 0; > >> > + > >> > + thread = kthread_run(printk_kthread_func, NULL, "printk"); > >> > >> This gets normal scheduling policy, so a spinning userspace SCHED_FIFO > >> task will block printk for ever. This seems bad. > > > > I have to research this a bit but won't the SCHED_FIFO task that has > > potentially unbounded amount of work lockup the CPU even though it does > > occasional cond_resched()? > > We are facing complete hogs because of the printk thread being a SCHED_FIFO > task and have this patch to fix it up for now. > > Author: Vladislav Levenetz <vblagoev@mm-sol.com> > Date: Wed Aug 10 13:58:00 2016 -0700 > > SW-7786: printk: Lower the priority of printk thread > > Flooding the console (with a test module) in a tight loop indefinitely > makes android user interface very sluggish. Opening YouTube app and the > device hangs and becomes even more unresponsive to the point it > completely hangs. > > The asynchronous printk thread is a SCHED FIFO thread with priority > MAX_RT_PRIO - 1. If we create it as a simple thread (i.e. no SCHED FIFO) > instead, we observe much better performance using the same printk flood > test. We don't even notice any kind of sluggishness during device usage. > We can play a YouTube clip smoothly and use the device normally in > general. The kernel log looks fine as well, as the flood of messages > continue normally. > > Signed-off-by: Vladislav Levenetz <vblagoev@mm-sol.com> > Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> > --- > kernel/printk/printk.c | 4 ---- > 1 file changed, 4 deletions(-) > > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c > index c32872872cb6..ad5b30e5e6d9 100644 > --- a/kernel/printk/printk.c > +++ b/kernel/printk/printk.c > @@ -2856,9 +2856,6 @@ static int printk_kthread_func(void *data) > static int __init_printk_kthread(void) > { > struct task_struct *thread; > - struct sched_param param = { > - .sched_priority = MAX_RT_PRIO - 1, > - }; > > if (!printk_kthread_can_run || printk_sync || printk_kthread) > return 0; > @@ -2870,7 +2867,6 @@ static int __init_printk_kthread(void) > return PTR_ERR(thread); > } > > - sched_setscheduler(thread, SCHED_FIFO, ¶m); > printk_kthread = thread; > return 0; > } IMHO, this is fine. We force the synchronous mode in critical situations anyway. But I was curious if we could hit a printk from the wake_up_process(). The change above causes using the fair scheduler and there is the following call chain [*] vprintk_emit() -> wake_up_process() -> try_to_wake_up() -> ttwu_queue() -> ttwu_do_activate() -> ttwu_activate() -> activate_task() -> enqueue_task() -> enqueue_task_fair() via p->sched_class->enqueue_task -> cfs_rq_of() -> task_of() -> WARN_ON_ONCE(!entity_is_task(se)) We should never trigger this because printk_kthread is a task. But what if the date gets inconsistent? Then there is the following chain: vprintk_emit() -> wake_up_process() -> try_to_wake_up() -> ttwu_queue() -> ttwu_do_activate() -> ttwu_activate() -> activate_task() -> enqueue_task() -> enqueue_task_fair() via p->sched_class->enqueue_task ->hrtick_update() -> hrtick_start_fair() -> WARN_ON(task_rq(p) != rq) This looks like another paranoid consistency check that might be triggered when the scheduler gets messed. I see few possible solutions: 1. Replace the WARN_ONs by printk_deferred(). This is the usual solution but it would make debugging less convenient. 2. Force synchronous printk inside WARN()/BUG() macros. This would make sense even from other reasons. These are printed when the system is in a strange state. There is no guarantee that the printk_kthread will get scheduled. 3. Force printk_deferred() inside WARN()/BUG() macros via the per-CPU printk_func. It might be elegant. But we do not want this outside the scheduler code. Therefore we would need special variants of WARN_*_SCHED() BUG_*_SCHED() macros. I personally prefer the 2nd solution. What do you think about it, please? Best Regards, Petr ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v10 1/2] printk: Make printk() completely async 2016-08-12 9:44 ` Petr Mladek @ 2016-08-15 14:26 ` Vladislav Levenetz 2016-08-16 9:04 ` Petr Mladek 2016-08-18 2:27 ` Sergey Senozhatsky 1 sibling, 1 reply; 39+ messages in thread From: Vladislav Levenetz @ 2016-08-15 14:26 UTC (permalink / raw) To: Petr Mladek, Viresh Kumar Cc: Jan Kara, Andrew Morton, Sergey Senozhatsky, Jan Kara, Tejun Heo, Tetsuo Handa, linux-kernel, Byungchul Park, Sergey Senozhatsky, Greg Kroah-Hartman On 08/12/2016 12:44 PM, Petr Mladek wrote: > On Wed 2016-08-10 14:17:55, Viresh Kumar wrote: >> +Vladi/Greg, >> >> On Wed, Apr 6, 2016 at 1:27 AM, Jan Kara <jack@suse.cz> wrote: >>> On Mon 04-04-16 15:51:49, Andrew Morton wrote: >>>>> +static int __init init_printk_kthread(void) >>>>> +{ >>>>> + struct task_struct *thread; >>>>> + >>>>> + if (printk_sync) >>>>> + return 0; >>>>> + >>>>> + thread = kthread_run(printk_kthread_func, NULL, "printk"); >>>> This gets normal scheduling policy, so a spinning userspace SCHED_FIFO >>>> task will block printk for ever. This seems bad. >>> I have to research this a bit but won't the SCHED_FIFO task that has >>> potentially unbounded amount of work lockup the CPU even though it does >>> occasional cond_resched()? >> We are facing complete hogs because of the printk thread being a SCHED_FIFO >> task and have this patch to fix it up for now. >> >> Author: Vladislav Levenetz <vblagoev@mm-sol.com> >> Date: Wed Aug 10 13:58:00 2016 -0700 >> >> SW-7786: printk: Lower the priority of printk thread >> >> Flooding the console (with a test module) in a tight loop indefinitely >> makes android user interface very sluggish. Opening YouTube app and the >> device hangs and becomes even more unresponsive to the point it >> completely hangs. >> >> The asynchronous printk thread is a SCHED FIFO thread with priority >> MAX_RT_PRIO - 1. If we create it as a simple thread (i.e. no SCHED FIFO) >> instead, we observe much better performance using the same printk flood >> test. We don't even notice any kind of sluggishness during device usage. >> We can play a YouTube clip smoothly and use the device normally in >> general. The kernel log looks fine as well, as the flood of messages >> continue normally. >> >> Signed-off-by: Vladislav Levenetz <vblagoev@mm-sol.com> >> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> >> --- >> kernel/printk/printk.c | 4 ---- >> 1 file changed, 4 deletions(-) >> >> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c >> index c32872872cb6..ad5b30e5e6d9 100644 >> --- a/kernel/printk/printk.c >> +++ b/kernel/printk/printk.c >> @@ -2856,9 +2856,6 @@ static int printk_kthread_func(void *data) >> static int __init_printk_kthread(void) >> { >> struct task_struct *thread; >> - struct sched_param param = { >> - .sched_priority = MAX_RT_PRIO - 1, >> - }; >> >> if (!printk_kthread_can_run || printk_sync || printk_kthread) >> return 0; >> @@ -2870,7 +2867,6 @@ static int __init_printk_kthread(void) >> return PTR_ERR(thread); >> } >> >> - sched_setscheduler(thread, SCHED_FIFO, ¶m); >> printk_kthread = thread; >> return 0; >> } > IMHO, this is fine. We force the synchronous mode in critical > situations anyway. > > But I was curious if we could hit a printk from the wake_up_process(). > The change above causes using the fair scheduler and there is > the following call chain [*] > > vprintk_emit() > -> wake_up_process() > -> try_to_wake_up() > -> ttwu_queue() > -> ttwu_do_activate() > -> ttwu_activate() > -> activate_task() > -> enqueue_task() > -> enqueue_task_fair() via p->sched_class->enqueue_task > -> cfs_rq_of() > -> task_of() > -> WARN_ON_ONCE(!entity_is_task(se)) > > We should never trigger this because printk_kthread is a task. > But what if the date gets inconsistent? > > Then there is the following chain: > > vprintk_emit() > -> wake_up_process() > -> try_to_wake_up() > -> ttwu_queue() > -> ttwu_do_activate() > -> ttwu_activate() > -> activate_task() > -> enqueue_task() > -> enqueue_task_fair() via p->sched_class->enqueue_task > ->hrtick_update() > -> hrtick_start_fair() > -> WARN_ON(task_rq(p) != rq) > > This looks like another paranoid consistency check that might be > triggered when the scheduler gets messed. > > I see few possible solutions: > > 1. Replace the WARN_ONs by printk_deferred(). > > This is the usual solution but it would make debugging less convenient. > > > 2. Force synchronous printk inside WARN()/BUG() macros. > > This would make sense even from other reasons. These are printed > when the system is in a strange state. There is no guarantee that > the printk_kthread will get scheduled. > > > 3. Force printk_deferred() inside WARN()/BUG() macros via the per-CPU > printk_func. > > It might be elegant. But we do not want this outside the scheduler > code. Therefore we would need special variants of WARN_*_SCHED() > BUG_*_SCHED() macros. > > > I personally prefer the 2nd solution. What do you think about it, > please? > > > Best Regards, > Petr Hi Petr, Sorry with for the late reply. Hitting a WARN()/BUG() from wake_up calls will lead to a deadlock if only a single CPU is running. We already had such a situation with system suspend. During a specific test on our device sometimes we hit a WARN from the time keeping core. (Cannot recall which one exactly. Viresh have it) from a printk wake_up path during system suspend and with already only one CPU running. So we were forced to make printing synchronous in the suspend path prior disabling all non-boot cpu's. Your solution number 2) sounds reasonable to me. I'm wondering if we could hit a WARN()/BUG() somewhere from the fair scheduler like the example you made for the RT sched? Thanks. ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v10 1/2] printk: Make printk() completely async 2016-08-15 14:26 ` Vladislav Levenetz @ 2016-08-16 9:04 ` Petr Mladek 0 siblings, 0 replies; 39+ messages in thread From: Petr Mladek @ 2016-08-16 9:04 UTC (permalink / raw) To: Vladislav Levenetz Cc: Viresh Kumar, Jan Kara, Andrew Morton, Sergey Senozhatsky, Jan Kara, Tejun Heo, Tetsuo Handa, linux-kernel, Byungchul Park, Sergey Senozhatsky, Greg Kroah-Hartman On Mon 2016-08-15 17:26:50, Vladislav Levenetz wrote: > On 08/12/2016 12:44 PM, Petr Mladek wrote: > >But I was curious if we could hit a printk from the wake_up_process(). > >The change above causes using the fair scheduler and there is > >the following call chain [*] > > > > vprintk_emit() > > -> wake_up_process() > > -> try_to_wake_up() > > -> ttwu_queue() > > -> ttwu_do_activate() > > -> ttwu_activate() > > -> activate_task() > > -> enqueue_task() > > -> enqueue_task_fair() via p->sched_class->enqueue_task > > -> cfs_rq_of() > > -> task_of() > > -> WARN_ON_ONCE(!entity_is_task(se)) > > > >We should never trigger this because printk_kthread is a task. > >But what if the date gets inconsistent? > > > >Then there is the following chain: > > > > vprintk_emit() > > -> wake_up_process() > > -> try_to_wake_up() > > -> ttwu_queue() > > -> ttwu_do_activate() > > -> ttwu_activate() > > -> activate_task() > > -> enqueue_task() > > -> enqueue_task_fair() via p->sched_class->enqueue_task > > ->hrtick_update() > > -> hrtick_start_fair() > > -> WARN_ON(task_rq(p) != rq) > > > >This looks like another paranoid consistency check that might be > >triggered when the scheduler gets messed. > > > >I see few possible solutions: > > > >1. Replace the WARN_ONs by printk_deferred(). > > > > This is the usual solution but it would make debugging less convenient. > > > > > >2. Force synchronous printk inside WARN()/BUG() macros. > > > > This would make sense even from other reasons. These are printed > > when the system is in a strange state. There is no guarantee that > > the printk_kthread will get scheduled. > > > > > >3. Force printk_deferred() inside WARN()/BUG() macros via the per-CPU > > printk_func. > > > > It might be elegant. But we do not want this outside the scheduler > > code. Therefore we would need special variants of WARN_*_SCHED() > > BUG_*_SCHED() macros. > > > > > >I personally prefer the 2nd solution. What do you think about it, > >please? > > > > > >Best Regards, > >Petr > > Hi Petr, > > Sorry with for the late reply. No problem. > Hitting a WARN()/BUG() from wake_up calls will lead to a deadlock if > only a single CPU is running. I think that the deadlock might happen also with more CPUs if the async_printk() is enabled. I mean: printk_emit() wake_up_process() try_to_wake_up() raw_spin_lock_irqsave(&p->pi_lock, flags) !!!! ttwu_queue() ttwu_do_activate() ttwu_activate() activate_task() enqueue_task() enqueue_task_fair() via p->sched_class->enqueue_task hrtick_update() hrtick_start_fair() WARN_ON(task_rq(p) != rq) printk() vprintk_emit() wake_up_process() try_to_wake_up() raw_spin_lock_irqsave(&p->pi_lock, flags) There is a deadlock because p->pi_lock is already taken by the first try_to_wake_up(). By other words, I think that the single running CPU was only a symptom but it was not the root cause of the deadlock. > We already had such a situation with system suspend. During a > specific test on our device sometimes we hit a WARN from the time > keeping core. (Cannot recall which one exactly. Viresh have it) from > a printk wake_up path during system suspend and with already only > one CPU running. > So we were forced to make printing synchronous in the suspend path > prior disabling all non-boot cpu's. > > Your solution number 2) sounds reasonable to me. Good. > I'm wondering if we could hit a WARN()/BUG() somewhere from the fair > scheduler like the example you made for the RT sched? Unfortunately, it looks like. The example above actually is from the fair scheduler. Best Regards, Petr ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v10 1/2] printk: Make printk() completely async 2016-08-12 9:44 ` Petr Mladek 2016-08-15 14:26 ` Vladislav Levenetz @ 2016-08-18 2:27 ` Sergey Senozhatsky 2016-08-18 9:33 ` Petr Mladek 1 sibling, 1 reply; 39+ messages in thread From: Sergey Senozhatsky @ 2016-08-18 2:27 UTC (permalink / raw) To: Petr Mladek Cc: Viresh Kumar, Jan Kara, Andrew Morton, Sergey Senozhatsky, Jan Kara, Tejun Heo, Tetsuo Handa, linux-kernel, Byungchul Park, Sergey Senozhatsky, vlevenetz, Greg Kroah-Hartman Hello, really sorry for very long reply. On (08/12/16 11:44), Petr Mladek wrote: [..] > IMHO, this is fine. We force the synchronous mode in critical > situations anyway. yes, I think it makes sense to lower the priority (we also have briefly discussed this in private emails with Viresh). I'd still prefer to have forced sync-printk on suspend/hibernate/etc., though. > But I was curious if we could hit a printk from the wake_up_process(). > The change above causes using the fair scheduler and there is > the following call chain [*] > > vprintk_emit() > -> wake_up_process() > -> try_to_wake_up() > -> ttwu_queue() > -> ttwu_do_activate() > -> ttwu_activate() > -> activate_task() > -> enqueue_task() > -> enqueue_task_fair() via p->sched_class->enqueue_task > -> cfs_rq_of() > -> task_of() > -> WARN_ON_ONCE(!entity_is_task(se)) > > We should never trigger this because printk_kthread is a task. > But what if the date gets inconsistent? > > Then there is the following chain: > > vprintk_emit() > -> wake_up_process() > -> try_to_wake_up() > -> ttwu_queue() > -> ttwu_do_activate() > -> ttwu_activate() > -> activate_task() > -> enqueue_task() > -> enqueue_task_fair() via p->sched_class->enqueue_task > ->hrtick_update() > -> hrtick_start_fair() > -> WARN_ON(task_rq(p) != rq) > > This looks like another paranoid consistency check that might be > triggered when the scheduler gets messed. > > I see few possible solutions: > > 1. Replace the WARN_ONs by printk_deferred(). > > This is the usual solution but it would make debugging less convenient. what I did internally was a combination of #1 and #3: I introduced a dump_stack_deferred() function which is basically (almost) a copy-past of dump_stack() from lib/dump_stack.c with the difference that it calls printk_deferred(). and added a WARN_ON_DEFERRED() macro. > 2. Force synchronous printk inside WARN()/BUG() macros. will it help? semaphore up() calls wake_up_process() regardless the context. not to mention that we still may have spin_dump() enabled. > 3. Force printk_deferred() inside WARN()/BUG() macros via the per-CPU > printk_func. > > It might be elegant. But we do not want this outside the scheduler > code. Therefore we would need special variants of WARN_*_SCHED() > BUG_*_SCHED() macros. > > I personally prefer the 2nd solution. What do you think about it, > please? I personally think a combo of #1 and #3 is a bit better than plain #2. -ss ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v10 1/2] printk: Make printk() completely async 2016-08-18 2:27 ` Sergey Senozhatsky @ 2016-08-18 9:33 ` Petr Mladek 2016-08-18 9:51 ` Sergey Senozhatsky 0 siblings, 1 reply; 39+ messages in thread From: Petr Mladek @ 2016-08-18 9:33 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Viresh Kumar, Jan Kara, Andrew Morton, Sergey Senozhatsky, Jan Kara, Tejun Heo, Tetsuo Handa, linux-kernel, Byungchul Park, vlevenetz, Greg Kroah-Hartman On Thu 2016-08-18 11:27:12, Sergey Senozhatsky wrote: > Hello, > > really sorry for very long reply. > > On (08/12/16 11:44), Petr Mladek wrote: > [..] > > IMHO, this is fine. We force the synchronous mode in critical > > situations anyway. > > yes, I think it makes sense to lower the priority (we also have > briefly discussed this in private emails with Viresh). I'd still > prefer to have forced sync-printk on suspend/hibernate/etc., though. Sounds fine to me. > > But I was curious if we could hit a printk from the wake_up_process(). > > The change above causes using the fair scheduler and there is > > the following call chain [*] > > > > I see few possible solutions: > > > > 1. Replace the WARN_ONs by printk_deferred(). > > > > This is the usual solution but it would make debugging less convenient. > > what I did internally was a combination of #1 and #3: I introduced a > dump_stack_deferred() function which is basically (almost) a copy-past > of dump_stack() from lib/dump_stack.c with the difference that it calls > printk_deferred(). and added a WARN_ON_DEFERRED() macro. > > > > 2. Force synchronous printk inside WARN()/BUG() macros. > > will it help? semaphore up() calls wake_up_process() regardless the context. > not to mention that we still may have spin_dump() enabled. Good point. That changes my preferences :-) > > > 3. Force printk_deferred() inside WARN()/BUG() macros via the per-CPU > > printk_func. > > > > It might be elegant. But we do not want this outside the scheduler > > code. Therefore we would need special variants of WARN_*_SCHED() > > BUG_*_SCHED() macros. Also we need to make sure that everything will be done on a single CPU as the printk_func is per-CPU variable. > > I personally prefer the 2nd solution. What do you think about it, > > please? > > I personally think a combo of #1 and #3 is a bit better than plain #2. I would need to see the code how it looks and if is really works. But yes, it seems that this is the way to go. Best Regards, Petr ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v10 1/2] printk: Make printk() completely async 2016-08-18 9:33 ` Petr Mladek @ 2016-08-18 9:51 ` Sergey Senozhatsky 2016-08-18 10:56 ` Petr Mladek 0 siblings, 1 reply; 39+ messages in thread From: Sergey Senozhatsky @ 2016-08-18 9:51 UTC (permalink / raw) To: Petr Mladek Cc: Sergey Senozhatsky, Viresh Kumar, Jan Kara, Andrew Morton, Sergey Senozhatsky, Jan Kara, Tejun Heo, Tetsuo Handa, linux-kernel, Byungchul Park, vlevenetz, Greg Kroah-Hartman On (08/18/16 11:33), Petr Mladek wrote: [..] > > > 2. Force synchronous printk inside WARN()/BUG() macros. > > > > will it help? semaphore up() calls wake_up_process() regardless the context. > > not to mention that we still may have spin_dump() enabled. > > Good point. That changes my preferences :-) > > > > > > 3. Force printk_deferred() inside WARN()/BUG() macros via the per-CPU > > > printk_func. > > > > > > It might be elegant. But we do not want this outside the scheduler > > > code. Therefore we would need special variants of WARN_*_SCHED() > > > BUG_*_SCHED() macros. > > Also we need to make sure that everything will be done on a single CPU > as the printk_func is per-CPU variable. lib/dump_stack.c dump_stack() disables local IRQs, so we will stay on the same CPU. schematically dump_stack_deferred() looks like the one below. I don't dump registers, that will require one more *_deferred() function. and as you can see I've replaced show_stack() with save_stack_trace(). it's not entirely universal (as of now), but worked for my needs at that time. --- asmlinkage __visible void dump_stack_deferred(void) { unsigned long flags; unsigned long stack_trace[16]; int was_locked; int old, cpu; int i; struct stack_trace trace = { .nr_entries = 0, .max_entries = ARRAY_SIZE(stack_trace), .entries = stack_trace, .skip = 0, }; local_irq_save(flags); retry: cpu = smp_processor_id(); old = atomic_cmpxchg(&dump_lock, -1, cpu); if (old == -1) { was_locked = 0; } else if (old == cpu) { was_locked = 1; } else { cpu_relax(); goto retry; } save_stack_trace(&trace); for (i = 0; i < trace.nr_entries; i++) { printk_deferred("[%p] %pS\n", (void *)stack_trace[i], (void *)stack_trace[i]); } if (!was_locked) atomic_set(&dump_lock, -1); local_irq_restore(flags); } --- -ss ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v10 1/2] printk: Make printk() completely async 2016-08-18 9:51 ` Sergey Senozhatsky @ 2016-08-18 10:56 ` Petr Mladek 2016-08-19 6:32 ` Sergey Senozhatsky 0 siblings, 1 reply; 39+ messages in thread From: Petr Mladek @ 2016-08-18 10:56 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Viresh Kumar, Jan Kara, Andrew Morton, Sergey Senozhatsky, Jan Kara, Tejun Heo, Tetsuo Handa, linux-kernel, Byungchul Park, vlevenetz, Greg Kroah-Hartman On Thu 2016-08-18 18:51:44, Sergey Senozhatsky wrote: > On (08/18/16 11:33), Petr Mladek wrote: > [..] > > > > 2. Force synchronous printk inside WARN()/BUG() macros. > > > > > > will it help? semaphore up() calls wake_up_process() regardless the context. > > > not to mention that we still may have spin_dump() enabled. > > > > Good point. That changes my preferences :-) > > > > > > > > > 3. Force printk_deferred() inside WARN()/BUG() macros via the per-CPU > > > > printk_func. > > > > > > > > It might be elegant. But we do not want this outside the scheduler > > > > code. Therefore we would need special variants of WARN_*_SCHED() > > > > BUG_*_SCHED() macros. > > > > Also we need to make sure that everything will be done on a single CPU > > as the printk_func is per-CPU variable. > > lib/dump_stack.c dump_stack() disables local IRQs, so we will stay > on the same CPU. It is less important to stay of the same CPU if you use printk_deferred() directly. But it us must to have if you use it via the printk_func per-CPU variable. The advantage of the printk_func trick is that it is transparent. You do not need to modify any existing functions used by WARN()/BUG() macros. Best Regards, Petr ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v10 1/2] printk: Make printk() completely async 2016-08-18 10:56 ` Petr Mladek @ 2016-08-19 6:32 ` Sergey Senozhatsky 2016-08-19 9:54 ` Petr Mladek 0 siblings, 1 reply; 39+ messages in thread From: Sergey Senozhatsky @ 2016-08-19 6:32 UTC (permalink / raw) To: Petr Mladek Cc: Sergey Senozhatsky, Viresh Kumar, Jan Kara, Andrew Morton, Sergey Senozhatsky, Jan Kara, Tejun Heo, Tetsuo Handa, linux-kernel, Byungchul Park, vlevenetz, Greg Kroah-Hartman On (08/18/16 12:56), Petr Mladek wrote: > On Thu 2016-08-18 18:51:44, Sergey Senozhatsky wrote: > > On (08/18/16 11:33), Petr Mladek wrote: > > [..] > > > > > 2. Force synchronous printk inside WARN()/BUG() macros. > > > > > > > > will it help? semaphore up() calls wake_up_process() regardless the context. > > > > not to mention that we still may have spin_dump() enabled. > > > > > > Good point. That changes my preferences :-) > > > > > > > > > > > > 3. Force printk_deferred() inside WARN()/BUG() macros via the per-CPU > > > > > printk_func. > > > > > > > > > > It might be elegant. But we do not want this outside the scheduler > > > > > code. Therefore we would need special variants of WARN_*_SCHED() > > > > > BUG_*_SCHED() macros. > > > > > > Also we need to make sure that everything will be done on a single CPU > > > as the printk_func is per-CPU variable. > > > > lib/dump_stack.c dump_stack() disables local IRQs, so we will stay > > on the same CPU. > > It is less important to stay of the same CPU if you use > printk_deferred() directly. But it us must to have if you > use it via the printk_func per-CPU variable. > > The advantage of the printk_func trick is that it is transparent. > You do not need to modify any existing functions used by WARN()/BUG() > macros. good point. so something like below, perhaps. I'm less sure about deferred BUG()/BUG_ON(): #define DEFERRED_BUG() do { \ printk_deferred_enter(); \ BUG(); \ printk_deferred_exit(); \ } while (0) \ #define DEFERRED_BUG_ON(condition) do { \ printk_deferred_enter(); \ BUG_ON(condition); \ printk_deferred_exit(); \ } while (0) depending on .config BUG() may never return back -- passing control to do_exit(), so printk_deferred_exit() won't be executed. thus we probably need to have a per-cpu variable that would indicate that we are in deferred_bug. hm... but do we really need deferred BUG() in the first place? Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> --- include/asm-generic/bug.h | 50 +++++++++++++++++++++++++++++++++++++++++++++++ include/linux/printk.h | 6 ++++++ kernel/printk/internal.h | 18 +---------------- kernel/printk/printk.c | 42 ++++++++++++++++++++++++++++++++++----- 4 files changed, 94 insertions(+), 22 deletions(-) diff --git a/include/asm-generic/bug.h b/include/asm-generic/bug.h index 6f96247..d72ee1e 100644 --- a/include/asm-generic/bug.h +++ b/include/asm-generic/bug.h @@ -60,6 +60,10 @@ struct bug_entry { * significant issues that need prompt attention if they should ever * appear at runtime. Use the versions with printk format strings * to provide better diagnostics. + * + * DEFERRED_WARN macros call printk_deferred() to print the messages + * and are meant to be used from the contexts where direct printk() + * can deadlock the system. */ #ifndef __WARN_TAINT extern __printf(3, 4) @@ -145,6 +149,52 @@ void __warn(const char *file, int line, void *caller, unsigned taint, unlikely(__ret_warn_once); \ }) +#define DEFERRED_WARN_ON(condition) ({ \ + int __ret_warn_on = !!(condition); \ + if (unlikely(__ret_warn_on)) { \ + printk_deferred_enter(); \ + __WARN(); \ + printk_deferred_exit(); \ + } \ + unlikely(__ret_warn_on); \ +}) + +#define DEFERRED_WARN(condition, format...) ({ \ + int __ret_warn_on = !!(condition); \ + if (unlikely(__ret_warn_on)) { \ + printk_deferred_enter(); \ + __WARN_printf(format); \ + printk_deferred_exit(); \ + } \ + unlikely(__ret_warn_on); \ +}) + +#define DEFERRED_WARN_ON_ONCE(condition) ({ \ + static bool __section(.data.unlikely) __warned; \ + int __ret_warn_once = !!(condition); \ + \ + if (unlikely(__ret_warn_once && !__warned)) { \ + __warned = true; \ + printk_deferred_enter(); \ + WARN_ON(1); \ + printk_deferred_exit(); \ + } \ + unlikely(__ret_warn_once); \ +}) + +#define DEFERRED_WARN_ONCE(condition, format...) ({ \ + static bool __section(.data.unlikely) __warned; \ + int __ret_warn_once = !!(condition); \ + \ + if (unlikely(__ret_warn_once && !__warned)) { \ + __warned = true; \ + printk_deferred_enter(); \ + WARN(1, format); \ + printk_deferred_exit(); \ + } \ + unlikely(__ret_warn_once); \ +}) + #else /* !CONFIG_BUG */ #ifndef HAVE_ARCH_BUG #define BUG() do {} while (1) diff --git a/include/linux/printk.h b/include/linux/printk.h index 696a56b..5142654 100644 --- a/include/linux/printk.h +++ b/include/linux/printk.h @@ -193,6 +193,9 @@ void __init setup_log_buf(int early); __printf(1, 2) void dump_stack_set_arch_desc(const char *fmt, ...); void dump_stack_print_info(const char *log_lvl); void show_regs_print_info(const char *log_lvl); + +extern void printk_deferred_enter(void); +extern void printk_deferred_exit(void); #else static inline __printf(1, 0) int vprintk(const char *s, va_list args) @@ -252,6 +255,9 @@ static inline void dump_stack_print_info(const char *log_lvl) static inline void show_regs_print_info(const char *log_lvl) { } + +static void printk_deferred_enter(void) {} +static void printk_deferred_exit(void) {} #endif extern asmlinkage void dump_stack(void) __cold; diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h index 7fd2838..70f1cf2 100644 --- a/kernel/printk/internal.h +++ b/kernel/printk/internal.h @@ -19,23 +19,12 @@ typedef __printf(1, 0) int (*printk_func_t)(const char *fmt, va_list args); int __printf(1, 0) vprintk_default(const char *fmt, va_list args); +DECLARE_PER_CPU(printk_func_t, printk_func); #ifdef CONFIG_PRINTK_NMI extern raw_spinlock_t logbuf_lock; -/* - * printk() could not take logbuf_lock in NMI context. Instead, - * it temporary stores the strings into a per-CPU buffer. - * The alternative implementation is chosen transparently - * via per-CPU variable. - */ -DECLARE_PER_CPU(printk_func_t, printk_func); -static inline __printf(1, 0) int vprintk_func(const char *fmt, va_list args) -{ - return this_cpu_read(printk_func)(fmt, args); -} - extern atomic_t nmi_message_lost; static inline int get_nmi_message_lost(void) { @@ -44,11 +33,6 @@ static inline int get_nmi_message_lost(void) #else /* CONFIG_PRINTK_NMI */ -static inline __printf(1, 0) int vprintk_func(const char *fmt, va_list args) -{ - return vprintk_default(fmt, args); -} - static inline int get_nmi_message_lost(void) { return 0; diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index 0d3e026..6e260a0 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -2026,7 +2026,17 @@ asmlinkage __visible int printk(const char *fmt, ...) int r; va_start(args, fmt); - r = vprintk_func(fmt, args); + /* + * printk() could not take logbuf_lock in NMI context. Instead, + * it temporary stores the strings into a per-CPU buffer. The + * alternative implementation is chosen transparently via per-CPU + * variable. + * + * We also switch printk_func from default to vprintk_deferred() + * when in DEFERRED_WARN()/DEFERRED_BUG(), because it may be + * unsafe to execute vprintk_default(). + */ + r = this_cpu_read(printk_func)(fmt, args); va_end(args); return r; @@ -3037,6 +3047,17 @@ void wake_up_klogd(void) preempt_enable(); } +static int vprintk_deferred(const char *fmt, va_list args) +{ + int r; + + r = vprintk_emit(0, LOGLEVEL_SCHED, NULL, 0, fmt, args); + __this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT); + irq_work_queue(this_cpu_ptr(&wake_up_klogd_work)); + + return r; +} + int printk_deferred(const char *fmt, ...) { va_list args; @@ -3044,16 +3065,27 @@ int printk_deferred(const char *fmt, ...) preempt_disable(); va_start(args, fmt); - r = vprintk_emit(0, LOGLEVEL_SCHED, NULL, 0, fmt, args); + r = vprintk_deferred(fmt, args); va_end(args); - - __this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT); - irq_work_queue(this_cpu_ptr(&wake_up_klogd_work)); preempt_enable(); return r; } +void printk_deferred_enter(void) +{ + preempt_disable(); + this_cpu_write(printk_func, vprintk_deferred); +} +EXPORT_SYMBOL(printk_deferred_enter); + +void printk_deferred_exit(void) +{ + this_cpu_write(printk_func, vprintk_default); + preempt_enable(); +} +EXPORT_SYMBOL(printk_deferred_exit); + /* * printk rate limiting, lifted from the networking subsystem. * -- 2.9.3 ^ permalink raw reply related [flat|nested] 39+ messages in thread
* Re: [PATCH v10 1/2] printk: Make printk() completely async 2016-08-19 6:32 ` Sergey Senozhatsky @ 2016-08-19 9:54 ` Petr Mladek 2016-08-19 19:00 ` Jan Kara 0 siblings, 1 reply; 39+ messages in thread From: Petr Mladek @ 2016-08-19 9:54 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Viresh Kumar, Jan Kara, Andrew Morton, Sergey Senozhatsky, Jan Kara, Tejun Heo, Tetsuo Handa, linux-kernel, Byungchul Park, vlevenetz, Greg Kroah-Hartman On Fri 2016-08-19 15:32:36, Sergey Senozhatsky wrote: > On (08/18/16 12:56), Petr Mladek wrote: > > The advantage of the printk_func trick is that it is transparent. > > You do not need to modify any existing functions used by WARN()/BUG() > > macros. > > good point. > > so something like below, perhaps. I'm less sure about > deferred BUG()/BUG_ON(): > > #define DEFERRED_BUG() do { \ > printk_deferred_enter(); \ > BUG(); \ > printk_deferred_exit(); \ > } while (0) \ > > #define DEFERRED_BUG_ON(condition) do { \ > printk_deferred_enter(); \ > BUG_ON(condition); \ > printk_deferred_exit(); \ > } while (0) > > depending on .config BUG() may never return back -- passing control > to do_exit(), so printk_deferred_exit() won't be executed. thus we > probably need to have a per-cpu variable that would indicate that > we are in deferred_bug. hm... but do we really need deferred BUG() > in the first place? Good question. I am not aware of any BUG_ON() that would be called from wake_up_process() but it is hard to check everything. A conservative approach would be to force synchronous printk from BUG_ON(). > Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> > > --- > > include/asm-generic/bug.h | 50 +++++++++++++++++++++++++++++++++++++++++++++++ > include/linux/printk.h | 6 ++++++ > kernel/printk/internal.h | 18 +---------------- > kernel/printk/printk.c | 42 ++++++++++++++++++++++++++++++++++----- > 4 files changed, 94 insertions(+), 22 deletions(-) > > diff --git a/include/asm-generic/bug.h b/include/asm-generic/bug.h > index 6f96247..d72ee1e 100644 > --- a/include/asm-generic/bug.h > +++ b/include/asm-generic/bug.h > @@ -60,6 +60,10 @@ struct bug_entry { > * significant issues that need prompt attention if they should ever > * appear at runtime. Use the versions with printk format strings > * to provide better diagnostics. > + * > + * DEFERRED_WARN macros call printk_deferred() to print the messages > + * and are meant to be used from the contexts where direct printk() > + * can deadlock the system. > */ > #ifndef __WARN_TAINT > extern __printf(3, 4) > @@ -145,6 +149,52 @@ void __warn(const char *file, int line, void *caller, unsigned taint, > unlikely(__ret_warn_once); \ > }) > > +#define DEFERRED_WARN_ON(condition) ({ \ > + int __ret_warn_on = !!(condition); \ > + if (unlikely(__ret_warn_on)) { \ > + printk_deferred_enter(); \ > + __WARN(); \ > + printk_deferred_exit(); \ > + } \ > + unlikely(__ret_warn_on); \ > +}) This looks reasonable to me. > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c > index 0d3e026..6e260a0 100644 > --- a/kernel/printk/printk.c > +++ b/kernel/printk/printk.c > > +void printk_deferred_enter(void) > +{ > + preempt_disable(); > + this_cpu_write(printk_func, vprintk_deferred); > +} > +EXPORT_SYMBOL(printk_deferred_enter); > + > +void printk_deferred_exit(void) > +{ > + this_cpu_write(printk_func, vprintk_default); > + preempt_enable(); > +} > +EXPORT_SYMBOL(printk_deferred_exit); > + This is racy with printk_nmi_enter() and printk_nmi_exit(). It need to work both ways. It must keep printk_deferred() even when DEFERRED_WARN_ON() is called in nmi context. Also it must keep printk_deferred() when the DEFERRED_WARN_ON() is interrupted by an nmi. It think that best solution would be to allow nesting. What about replacing "printk_func" per-CPU variable with a per-CPU atomic counter. Then we could just check the counter in vprintk_emit() to see if it is deferred or not. The approach with printk_func() was more generic. We thought that it might be used even for a transparent early_printk(). But it still might be solved even with the per-CPU atomic counter. We could fallback to the early_printk when a global flag is set or so. Best Regards, Petr ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v10 1/2] printk: Make printk() completely async 2016-08-19 9:54 ` Petr Mladek @ 2016-08-19 19:00 ` Jan Kara 2016-08-20 5:24 ` Sergey Senozhatsky 2016-08-23 13:48 ` Petr Mladek 0 siblings, 2 replies; 39+ messages in thread From: Jan Kara @ 2016-08-19 19:00 UTC (permalink / raw) To: Petr Mladek Cc: Sergey Senozhatsky, Viresh Kumar, Jan Kara, Andrew Morton, Sergey Senozhatsky, Jan Kara, Tejun Heo, Tetsuo Handa, linux-kernel, Byungchul Park, vlevenetz, Greg Kroah-Hartman On Fri 19-08-16 11:54:55, Petr Mladek wrote: > On Fri 2016-08-19 15:32:36, Sergey Senozhatsky wrote: > > On (08/18/16 12:56), Petr Mladek wrote: > > > The advantage of the printk_func trick is that it is transparent. > > > You do not need to modify any existing functions used by WARN()/BUG() > > > macros. > > > > good point. > > > > so something like below, perhaps. I'm less sure about > > deferred BUG()/BUG_ON(): > > > > #define DEFERRED_BUG() do { \ > > printk_deferred_enter(); \ > > BUG(); \ > > printk_deferred_exit(); \ > > } while (0) \ > > > > #define DEFERRED_BUG_ON(condition) do { \ > > printk_deferred_enter(); \ > > BUG_ON(condition); \ > > printk_deferred_exit(); \ > > } while (0) > > > > depending on .config BUG() may never return back -- passing control > > to do_exit(), so printk_deferred_exit() won't be executed. thus we > > probably need to have a per-cpu variable that would indicate that > > we are in deferred_bug. hm... but do we really need deferred BUG() > > in the first place? > > Good question. I am not aware of any BUG_ON() that would be called from > wake_up_process() but it is hard to check everything. > > A conservative approach would be to force synchronous printk from > BUG_ON(). Just a quick thought: Cannot we just do printk_deferred_enter() when we are about to call into the scheduler from printk code and printk_deferred_exit() when leaving it? That would look like the least error-prone way how handling this kind of recursion... OTOH there's also the other possible direction for the recursion when we are in the scheduler, holding some scheduler locks, decide to WARN which enters printk, that ends up calling wake_up_process() which deadlocks on scheduler locks... I don't see how to handle this type of recursion inside the printk code itself easily and so far the answer was - use printk_deferred() in the scheduler and don't use WARN... Hum, maybe we could add lockdep annotation to a WARN_ON and BUG_ON macros so that it would grab and release console_sem (even if the condition is false). That way we'd get lockdep splats for all the possible WARN_ON and BUG_ON calls that could deadlock. Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v10 1/2] printk: Make printk() completely async 2016-08-19 19:00 ` Jan Kara @ 2016-08-20 5:24 ` Sergey Senozhatsky 2016-08-22 4:15 ` Sergey Senozhatsky 2016-08-23 13:03 ` Petr Mladek 2016-08-23 13:48 ` Petr Mladek 1 sibling, 2 replies; 39+ messages in thread From: Sergey Senozhatsky @ 2016-08-20 5:24 UTC (permalink / raw) To: Jan Kara Cc: Petr Mladek, Sergey Senozhatsky, Viresh Kumar, Andrew Morton, Sergey Senozhatsky, Jan Kara, Tejun Heo, Tetsuo Handa, linux-kernel, Byungchul Park, vlevenetz, Greg Kroah-Hartman On (08/19/16 21:00), Jan Kara wrote: > > > depending on .config BUG() may never return back -- passing control > > > to do_exit(), so printk_deferred_exit() won't be executed. thus we > > > probably need to have a per-cpu variable that would indicate that > > > we are in deferred_bug. hm... but do we really need deferred BUG() > > > in the first place? > > > > Good question. I am not aware of any BUG_ON() that would be called from > > wake_up_process() but it is hard to check everything. > > > > A conservative approach would be to force synchronous printk from > > BUG_ON(). > > Just a quick thought: Cannot we just do printk_deferred_enter() when we are > about to call into the scheduler from printk code and printk_deferred_exit() > when leaving it? That would look like the least error-prone way how > handling this kind of recursion... interesting idea. printk_deferred_enter() increments preempt count, so there may be additional obstacles and, as a result, ad-hocs, that scheduler people will sincerely hate. need to think more. > OTOH there's also the other possible direction for the recursion when we > are in the scheduler, holding some scheduler locks, decide to WARN which > enters printk, that ends up calling wake_up_process() which deadlocks > on scheduler locks... I don't see how to handle this type of recursion > inside the printk code itself easily and so far the answer was - use > printk_deferred() in the scheduler and don't use WARN... the recursion detection is really tricky, yes. it seems (and I haven't thought of it good enough) to be a bit simpler when we operate in async printk mode, because we remove this uncontrollable console_unlock(). so we can do something like this: vprintk_emit(....) { local_irq_save(); if (this_cpu_read(in_printk)) { log_store(BUG: printk recursion!"); goto out; } this_cpu_write(in_printk) = 1; raw_spin_lock(&logbuf_lock); log_store(); raw_spin_unlock(&logbuf_lock); if (!in_sched) { if (console_loglevel != CONSOLE_LOGLEVEL_MOTORMOUTH && can_printk_async()) { printk_kthread_need_flush_console = true; wake_up_process(printk_kthread); } } this_cpu_write(in_printk) = 0; out: local_irq_restore(); } async printk mode from this point of view is sort of atomic. we can even set different values of per-CPU `in_printk' on various stages of printk, which will permit to have better recursion handling. for example, if we recurse from raw_spin_unlock(&logbuf_lock) then we must re-init logbuf_lock, because it's 99% corrupted... and so on. but I haven't really thought of it yet. it obviously doesn't work for sync printk mode. > Hum, maybe we could add lockdep annotation to a WARN_ON and BUG_ON macros so > that it would grab and release console_sem (even if the condition is false). > That way we'd get lockdep splats for all the possible WARN_ON and BUG_ON > calls that could deadlock. hm. -ss ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v10 1/2] printk: Make printk() completely async 2016-08-20 5:24 ` Sergey Senozhatsky @ 2016-08-22 4:15 ` Sergey Senozhatsky 2016-08-23 12:19 ` Petr Mladek 2016-08-25 21:10 ` Petr Mladek 2016-08-23 13:03 ` Petr Mladek 1 sibling, 2 replies; 39+ messages in thread From: Sergey Senozhatsky @ 2016-08-22 4:15 UTC (permalink / raw) To: Jan Kara, Petr Mladek Cc: Sergey Senozhatsky, Sergey Senozhatsky, Viresh Kumar, Andrew Morton, Jan Kara, Tejun Heo, Tetsuo Handa, linux-kernel, Byungchul Park, vlevenetz, Greg Kroah-Hartman Hello, On (08/20/16 14:24), Sergey Senozhatsky wrote: > On (08/19/16 21:00), Jan Kara wrote: > > > > depending on .config BUG() may never return back -- passing control > > > > to do_exit(), so printk_deferred_exit() won't be executed. thus we > > > > probably need to have a per-cpu variable that would indicate that > > > > we are in deferred_bug. hm... but do we really need deferred BUG() > > > > in the first place? > > > > > > Good question. I am not aware of any BUG_ON() that would be called from > > > wake_up_process() but it is hard to check everything. > > > > > > A conservative approach would be to force synchronous printk from > > > BUG_ON(). > > > > Just a quick thought: Cannot we just do printk_deferred_enter() when we are > > about to call into the scheduler from printk code and printk_deferred_exit() > > when leaving it? That would look like the least error-prone way how > > handling this kind of recursion... > > interesting idea. > printk_deferred_enter() increments preempt count, so there may be additional > obstacles and, as a result, ad-hocs, that scheduler people will sincerely hate. > need to think more. so we probably can try something like this --- diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 2add7c5..b23f919 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2014,6 +2014,7 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags) * set_current_state() the waiting thread does. */ smp_mb__before_spinlock(); + printk_deferred_enter(); raw_spin_lock_irqsave(&p->pi_lock, flags); if (!(p->state & state)) goto out; @@ -2073,6 +2074,7 @@ stat: ttwu_stat(p, cpu, wake_flags); out: raw_spin_unlock_irqrestore(&p->pi_lock, flags); + printk_deferred_exit(); return success; } --- since we are basically interested in wake_up_process() only from printk() POV. not sure how acceptable 2 * preempt_count and 2 * per-CPU writes for every try_to_wake_up(). the other thing I just thought of is doing something as follows !!!not tested, will not compile, just an idea!!! --- diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index 6e260a0..bb8d719 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -1789,6 +1789,7 @@ asmlinkage int vprintk_emit(int facility, int level, printk_delay(); local_irq_save(flags); + printk_nmi_enter(); this_cpu = smp_processor_id(); /* @@ -1804,6 +1805,7 @@ asmlinkage int vprintk_emit(int facility, int level, */ if (!oops_in_progress && !lockdep_recursing(current)) { recursion_bug = true; + printk_nmi_exit(); local_irq_restore(flags); return 0; } @@ -1920,6 +1922,7 @@ asmlinkage int vprintk_emit(int facility, int level, logbuf_cpu = UINT_MAX; raw_spin_unlock(&logbuf_lock); lockdep_on(); + printk_nmi_exit(); local_irq_restore(flags); /* If called from the scheduler, we can not call up(). */ --- so may be we will not blow up in case of spin_dump() on logbuf_lock. well, if logbuf_lock was corrupted then nothing will help us -- it's over. but if raw_spin_lock(&logbuf_lock), for instance, spin_dump()-s because logbuf_lock was not released after `loops_per_jiffy * HZ' then deferred printing may help. probably we can make NMI printk_func to be more general way of using an alternative-printk buffer and, for example, rename API it to printk_alternative_enter() and printk_alternative_exit(), not to confuse anyone. (yes, NMIs are not available on every platform) and even more. we can start dump_stack() from recursion detection path to that alternative printk-buffer, which is a bit more helpful than "BUG: recent printk recursion!" !!!not tested, will not compile, just an idea!!! --- diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index 6e260a0..ebce39a 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -1789,6 +1789,7 @@ asmlinkage int vprintk_emit(int facility, int level, printk_delay(); local_irq_save(flags); + printk_nmi_enter(); this_cpu = smp_processor_id(); /* @@ -1804,6 +1805,8 @@ asmlinkage int vprintk_emit(int facility, int level, */ if (!oops_in_progress && !lockdep_recursing(current)) { recursion_bug = true; + WARN_ON(1); /* <<< dump_stack to alternative buffer */ + printk_nmi_exit(); local_irq_restore(flags); return 0; } @@ -1816,14 +1819,8 @@ asmlinkage int vprintk_emit(int facility, int level, logbuf_cpu = this_cpu; if (unlikely(recursion_bug)) { - static const char recursion_msg[] = - "BUG: recent printk recursion!"; - recursion_bug = false; - /* emit KERN_CRIT message */ - printed_len += log_store(0, 2, LOG_PREFIX|LOG_NEWLINE, 0, - NULL, 0, recursion_msg, - strlen(recursion_msg)); + printk_nmi_flush(); } nmi_message_lost = get_nmi_message_lost(); @@ -1920,6 +1917,7 @@ asmlinkage int vprintk_emit(int facility, int level, logbuf_cpu = UINT_MAX; raw_spin_unlock(&logbuf_lock); lockdep_on(); + printk_nmi_exit(); local_irq_restore(flags); /* If called from the scheduler, we can not call up(). */ --- just some quick thoughts. -ss ^ permalink raw reply related [flat|nested] 39+ messages in thread
* Re: [PATCH v10 1/2] printk: Make printk() completely async 2016-08-22 4:15 ` Sergey Senozhatsky @ 2016-08-23 12:19 ` Petr Mladek 2016-08-24 1:33 ` Sergey Senozhatsky 2016-08-25 21:10 ` Petr Mladek 1 sibling, 1 reply; 39+ messages in thread From: Petr Mladek @ 2016-08-23 12:19 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Jan Kara, Sergey Senozhatsky, Viresh Kumar, Andrew Morton, Jan Kara, Tejun Heo, Tetsuo Handa, linux-kernel, Byungchul Park, vlevenetz, Greg Kroah-Hartman On Mon 2016-08-22 13:15:20, Sergey Senozhatsky wrote: > Hello, > > On (08/20/16 14:24), Sergey Senozhatsky wrote: > > On (08/19/16 21:00), Jan Kara wrote: > > > > > depending on .config BUG() may never return back -- passing control > > > > > to do_exit(), so printk_deferred_exit() won't be executed. thus we > > > > > probably need to have a per-cpu variable that would indicate that > > > > > we are in deferred_bug. hm... but do we really need deferred BUG() > > > > > in the first place? > > > > > > > > Good question. I am not aware of any BUG_ON() that would be called from > > > > wake_up_process() but it is hard to check everything. > > > > > > > > A conservative approach would be to force synchronous printk from > > > > BUG_ON(). > > > > > > Just a quick thought: Cannot we just do printk_deferred_enter() when we are > > > about to call into the scheduler from printk code and printk_deferred_exit() > > > when leaving it? That would look like the least error-prone way how > > > handling this kind of recursion... > > > > interesting idea. > > printk_deferred_enter() increments preempt count, so there may be additional > > obstacles and, as a result, ad-hocs, that scheduler people will sincerely hate. > > need to think more. > > the other thing I just thought of is doing something as follows > !!!not tested, will not compile, just an idea!!! > > --- > > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c > index 6e260a0..bb8d719 100644 > --- a/kernel/printk/printk.c > +++ b/kernel/printk/printk.c > @@ -1789,6 +1789,7 @@ asmlinkage int vprintk_emit(int facility, int level, > printk_delay(); > > local_irq_save(flags); > + printk_nmi_enter(); > this_cpu = smp_processor_id(); Huh, this looks very interesting but I am afraid that it will not fly. The problem is that vprintk_nmi() is safe only when it is used exclusively in NMI. The following could happen with your code: /**** normar context ****/ vprintk_emit() printk_nmi_enter() ... wake_up_process() WARN() printk() vprintk_nmi() vsnprintf(..., "0123456789") /* real NMI comes after writing "01234" */ /**** NMI context ****/ vprintk_nmi(); vsnprintf(..., "abcdefghijklmno"); /**** normal context ****/ /* we finish writing "56789" into the buffer */ => part of the message from NMI gets broken "abcde56789klmno". The lockless handling of the NMI per-CPU buffer already is not trivial. I would be afraid to add more hacks to make it writable in all contexts. I am sorry about the bad news. This was so promising on the first look. Best Regards, Petr ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v10 1/2] printk: Make printk() completely async 2016-08-23 12:19 ` Petr Mladek @ 2016-08-24 1:33 ` Sergey Senozhatsky 0 siblings, 0 replies; 39+ messages in thread From: Sergey Senozhatsky @ 2016-08-24 1:33 UTC (permalink / raw) To: Petr Mladek Cc: Sergey Senozhatsky, Jan Kara, Sergey Senozhatsky, Viresh Kumar, Andrew Morton, Jan Kara, Tejun Heo, Tetsuo Handa, linux-kernel, Byungchul Park, vlevenetz, Greg Kroah-Hartman On (08/23/16 14:19), Petr Mladek wrote: > > the other thing I just thought of is doing something as follows > > !!!not tested, will not compile, just an idea!!! > > > > --- > > > > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c > > index 6e260a0..bb8d719 100644 > > --- a/kernel/printk/printk.c > > +++ b/kernel/printk/printk.c > > @@ -1789,6 +1789,7 @@ asmlinkage int vprintk_emit(int facility, int level, > > printk_delay(); > > > > local_irq_save(flags); > > + printk_nmi_enter(); > > this_cpu = smp_processor_id(); > > Huh, this looks very interesting but I am afraid that it will not fly. > The problem is that vprintk_nmi() is safe only when it is used > exclusively in NMI. > > The following could happen with your code: sure. I took NMI as a quick and easy example that did not require a lot of words to explain because you guys are familiar with NMI printing. the thing is - we don't have to share the same buffer with the actual NMI, but instead have our own small 'alternative-printk' per-cpu buffers (a page or two, perhaps). it's up to `printk_func' that we set up at the entry point and flush function that we execute later. vprintk_nmi() writes to this_cpu_ptr(&nmi_print_seq), vprintk_alter() will write to this_cpu_ptr(&alt_print_seq), for example. the same with flushing: printk_nmi_flush reads in from per_cpu(nmi_print_seq); alter-printk flush can read from per_cpu(alt_print_seq). [..] > The lockless handling of the NMI per-CPU buffer already is not > trivial. I would be afraid to add more hacks to make > it writable in all contexts. yes. we can reuse the idea of NMI printk and most of the code. the thing we really care is `struct nmi_seq_buf' assigned to this particular CPU, and there can be several seq_buffer-s. > I am sorry about the bad news. This was so promising on the first > look. it's not dead yet ;) -ss ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v10 1/2] printk: Make printk() completely async 2016-08-22 4:15 ` Sergey Senozhatsky 2016-08-23 12:19 ` Petr Mladek @ 2016-08-25 21:10 ` Petr Mladek 2016-08-26 1:56 ` Sergey Senozhatsky 1 sibling, 1 reply; 39+ messages in thread From: Petr Mladek @ 2016-08-25 21:10 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Jan Kara, Sergey Senozhatsky, Viresh Kumar, Andrew Morton, Jan Kara, Tejun Heo, Tetsuo Handa, linux-kernel, Byungchul Park, vlevenetz, Greg Kroah-Hartman On Mon 2016-08-22 13:15:20, Sergey Senozhatsky wrote: > Hello, > > On (08/20/16 14:24), Sergey Senozhatsky wrote: > > On (08/19/16 21:00), Jan Kara wrote: > > > > > depending on .config BUG() may never return back -- passing control > > > > > to do_exit(), so printk_deferred_exit() won't be executed. thus we > > > > > probably need to have a per-cpu variable that would indicate that > > > > > we are in deferred_bug. hm... but do we really need deferred BUG() > > > > > in the first place? > > > > > since we are basically interested in wake_up_process() only from > printk() POV. not sure how acceptable 2 * preempt_count and 2 * per-CPU > writes for every try_to_wake_up(). > > > the other thing I just thought of is doing something as follows > !!!not tested, will not compile, just an idea!!! > > --- > > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c > index 6e260a0..bb8d719 100644 > --- a/kernel/printk/printk.c > +++ b/kernel/printk/printk.c > @@ -1789,6 +1789,7 @@ asmlinkage int vprintk_emit(int facility, int level, > printk_delay(); > > local_irq_save(flags); > + printk_nmi_enter(); > this_cpu = smp_processor_id(); > > /* > @@ -1804,6 +1805,7 @@ asmlinkage int vprintk_emit(int facility, int level, > */ > if (!oops_in_progress && !lockdep_recursing(current)) { > recursion_bug = true; > + printk_nmi_exit(); > local_irq_restore(flags); > return 0; > } > @@ -1920,6 +1922,7 @@ asmlinkage int vprintk_emit(int facility, int level, > logbuf_cpu = UINT_MAX; > raw_spin_unlock(&logbuf_lock); > lockdep_on(); > + printk_nmi_exit(); > local_irq_restore(flags); > > /* If called from the scheduler, we can not call up(). */ I was so taken by the idea of temporary forcing a lockless and "trivial" printk implementation that I missed one thing. Your patch use the alternative printk() variant around logbuf_lock. But this is not the problem with wake_up_process(). printk_deferred() takes logbuf_lock without problems. Our problem is with calling wake_up_process() recursively. The deadlock is in the scheduler locks. But the patch still inspired me. What about blocking the problematic wake_up_process() call by a per-cpu variable. I mean something like this completely untested code: diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index ca9733b802ce..93915eb1fd0d 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -1708,6 +1708,8 @@ static size_t cont_print_text(char *text, size_t size) return textlen; } +DEFINE_PER_CPU(bool, printk_wakeup); + asmlinkage int vprintk_emit(int facility, int level, const char *dict, size_t dictlen, const char *fmt, va_list args) @@ -1902,8 +1904,17 @@ asmlinkage int vprintk_emit(int facility, int level, lockdep_off(); if (printk_kthread && !in_panic) { + bool __percpu *printk_wakeup_ptr; + /* Offload printing to a schedulable context. */ - wake_up_process(printk_kthread); + local_irq_save(flags); + printk_wake_up_ptr = this_cpu_ptr(&printk_wake_up); + if (!*printk_wakeup_ptr) { + *printk_wake_up_ptr = true; + wake_up_process(printk_kthread); + *printk_wake_up_ptr = false; + } + local_irq_restore(flags); goto out_lockdep; } else { /* We might eventually hide this into a wake_up_process_safe() or so. Also we might need to use it also in console_unlock() to avoid similar recursion there as well. Best Regards, Petr ^ permalink raw reply related [flat|nested] 39+ messages in thread
* Re: [PATCH v10 1/2] printk: Make printk() completely async 2016-08-25 21:10 ` Petr Mladek @ 2016-08-26 1:56 ` Sergey Senozhatsky 2016-08-26 8:20 ` Sergey Senozhatsky 2016-08-30 9:29 ` Petr Mladek 0 siblings, 2 replies; 39+ messages in thread From: Sergey Senozhatsky @ 2016-08-26 1:56 UTC (permalink / raw) To: Petr Mladek Cc: Sergey Senozhatsky, Jan Kara, Sergey Senozhatsky, Viresh Kumar, Andrew Morton, Jan Kara, Tejun Heo, Tetsuo Handa, linux-kernel, Byungchul Park, vlevenetz, Greg Kroah-Hartman On (08/25/16 23:10), Petr Mladek wrote: [..] > I was so taken by the idea of temporary forcing a lockless and > "trivial" printk implementation that I missed one thing. > > Your patch use the alternative printk() variant around logbuf_lock. > But this is not the problem with wake_up_process(). printk_deferred() > takes logbuf_lock without problems. you didn't miss anything, I think I wasn't too descriptive and that caused some confusion. this patch is not a replacement of wake_up_process() patch posted earlier in the loop, but an addition to it. not only every WARN/BUG issued from wake_up_process() will do no good, but every lock we take is potentially dangerous as well. In the simplest case because of $LOCK-debug.c files in kernel/locking (spin_lock in our case); in the worst case -- because of WARNs issued by log_store() and friends (there may be custom modifications) or by violations of spinlock atomicity requirements. For example, vprintk_emit() local_irq_save() raw_spin_lock() text_len = vscnprintf(text, sizeof(textbuf), fmt, args) { vsnprintf() { if (WARN_ON_ONCE(size > INT_MAX)) return 0; } } ... this is a rather unlikely event, sure, there must be some sort of memory corruption or something else, but the thing is -- if it will happen, printk() will not be willing to help. wake_up_process() change, posted earlier, is using a deferred version of WARN macro, but we definitely can (and we better do) switch to lockless alternative printk() in both cases and don't bother with new macros. replacing all of the existing ones with 'safe' deferred versions is a difficult task, but keeping track of a newly introduced ones is even harder (if possible at all). > +DEFINE_PER_CPU(bool, printk_wakeup); > + > asmlinkage int vprintk_emit(int facility, int level, > const char *dict, size_t dictlen, > const char *fmt, va_list args) > @@ -1902,8 +1904,17 @@ asmlinkage int vprintk_emit(int facility, int level, > lockdep_off(); > > if (printk_kthread && !in_panic) { > + bool __percpu *printk_wakeup_ptr; > + > /* Offload printing to a schedulable context. */ > - wake_up_process(printk_kthread); > + local_irq_save(flags); > + printk_wake_up_ptr = this_cpu_ptr(&printk_wake_up); > + if (!*printk_wakeup_ptr) { > + *printk_wake_up_ptr = true; > + wake_up_process(printk_kthread); > + *printk_wake_up_ptr = false; > + } > + local_irq_restore(flags); > goto out_lockdep; > } else { this can do, thanks. I would probably prefer, for the time being, to have a single mechanism that we will use in both cases. something like this: vprintk_emit() { alt_printk_enter(); ... log_store(); ... alt_printk_exit(); wakep_up_process() /* direct from async printk, or indirect from console_unlock()->up() */ alt_printk_enter(); ... enqueue task alt_printk_exit(); } and we need to have some sort of rollback to default printk() if BUG() goes to panic() (both on HAVE_ARCH_BUG and !HAVE_ARCH_BUG platforms): static void oops_end(...) { ... if (in_interrupt()) panic("Fatal exception in interrupt"); if (panic_on_oops) panic("Fatal exception"); if (signr) do_exit(signr); } not so sure about do_exit(). we are specifically talking here about wake_up_process()->try_to_wake_up(), which does all of its job under raw_spin_lock_irqsave(&p->pi_lock), so IF there is a BUG() that does do_exit() /* hard to imagine that */, then nothing will help us out, I think. -ss ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v10 1/2] printk: Make printk() completely async 2016-08-26 1:56 ` Sergey Senozhatsky @ 2016-08-26 8:20 ` Sergey Senozhatsky 2016-08-30 9:29 ` Petr Mladek 1 sibling, 0 replies; 39+ messages in thread From: Sergey Senozhatsky @ 2016-08-26 8:20 UTC (permalink / raw) To: Petr Mladek Cc: Jan Kara, Sergey Senozhatsky, Viresh Kumar, Andrew Morton, Jan Kara, Tejun Heo, Tetsuo Handa, linux-kernel, Byungchul Park, vlevenetz, Greg Kroah-Hartman, Sergey Senozhatsky On (08/26/16 10:56), Sergey Senozhatsky wrote: > but every lock we take is potentially dangerous as well. ... > vprintk_emit() > { > alt_printk_enter(); > ... > log_store(); > ... > alt_printk_exit(); > > wakep_up_process() /* direct from async printk, > or indirect from console_unlock()->up() */ > alt_printk_enter(); > ... enqueue task > alt_printk_exit(); > } OTOH, after a very quick thought, up() also takes a spin lock, which may spindump. so I'll probably prefer to keep the entire alt-printk thing entirely in printk(). something like this vprintk_emit() { alt_printk_enter() log_store() alt_printk_exit() if (async_printk) { alt_printk_enter() wake_up_process() alt_printk_exit() } else { if (console_trylock()) { console_unlock() { .... alt_printk_enter() up() alt_printk_exit() } } } } this leaves console_trylock() `unprotected'. so I guess I'll do something like this: } else { int ret; alt_printk_enter() ret = console_trylock(); alt_printk_exit() if (ret) console_unlock(); } a bit ugly, but well, it is what it is. will think more about it. -ss ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v10 1/2] printk: Make printk() completely async 2016-08-26 1:56 ` Sergey Senozhatsky 2016-08-26 8:20 ` Sergey Senozhatsky @ 2016-08-30 9:29 ` Petr Mladek 2016-08-31 2:31 ` Sergey Senozhatsky 1 sibling, 1 reply; 39+ messages in thread From: Petr Mladek @ 2016-08-30 9:29 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Jan Kara, Sergey Senozhatsky, Viresh Kumar, Andrew Morton, Jan Kara, Tejun Heo, Tetsuo Handa, linux-kernel, Byungchul Park, vlevenetz, Greg Kroah-Hartman On Fri 2016-08-26 10:56:41, Sergey Senozhatsky wrote: > On (08/25/16 23:10), Petr Mladek wrote: > [..] > > I was so taken by the idea of temporary forcing a lockless and > > "trivial" printk implementation that I missed one thing. > > > > Your patch use the alternative printk() variant around logbuf_lock. > > But this is not the problem with wake_up_process(). printk_deferred() > > takes logbuf_lock without problems. > > you didn't miss anything, I think I wasn't too descriptive and that caused > some confusion. this patch is not a replacement of wake_up_process() patch > posted earlier in the loop, but an addition to it. not only every WARN/BUG > issued from wake_up_process() will do no good, but every lock we take is > potentially dangerous as well. In the simplest case because of $LOCK-debug.c > files in kernel/locking (spin_lock in our case); in the worst case -- > because of WARNs issued by log_store() and friends (there may be custom > modifications) or by violations of spinlock atomicity requirements. > > For example, > > vprintk_emit() > local_irq_save() > raw_spin_lock() > text_len = vscnprintf(text, sizeof(textbuf), fmt, args) > { > vsnprintf() > { > if (WARN_ON_ONCE(size > INT_MAX)) > return 0; > } > } > ... > > this is a rather unlikely event, sure, there must be some sort of > memory corruption or something else, but the thing is -- if it will > happen, printk() will not be willing to help. > > wake_up_process() change, posted earlier, is using a deferred version of > WARN macro, but we definitely can (and we better do) switch to lockless > alternative printk() in both cases and don't bother with new macros. > replacing all of the existing ones with 'safe' deferred versions is > a difficult task, but keeping track of a newly introduced ones is even > harder (if possible at all). I see. It makes some sense. I would like to be on the safe side. I am just afraid that adding yet another per-CPU buffer is too complex. It adds quite some complexity to the code. And it even more scatters the messages so that it will be harder to get them from the crash dump or flush them to the console when the system goes down. It took few years to get in the solution for NMIs even when it fixed real life deadlocks for many people and customers. I am afraid that it is not realistic to get in similar complex code to fix rather theoretical problems. Sigh, I waited few days with this comment. I do not want to sound like a broken record. I have hoped that anyone else could say an opinion. Best Regards, Petr ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v10 1/2] printk: Make printk() completely async 2016-08-30 9:29 ` Petr Mladek @ 2016-08-31 2:31 ` Sergey Senozhatsky 2016-08-31 9:38 ` Petr Mladek 0 siblings, 1 reply; 39+ messages in thread From: Sergey Senozhatsky @ 2016-08-31 2:31 UTC (permalink / raw) To: Petr Mladek Cc: Sergey Senozhatsky, Jan Kara, Sergey Senozhatsky, Viresh Kumar, Andrew Morton, Jan Kara, Tejun Heo, Tetsuo Handa, linux-kernel, Byungchul Park, vlevenetz, Greg Kroah-Hartman On (08/30/16 11:29), Petr Mladek wrote: > > you didn't miss anything, I think I wasn't too descriptive and that caused > > some confusion. this patch is not a replacement of wake_up_process() patch > > posted earlier in the loop, but an addition to it. not only every WARN/BUG > > issued from wake_up_process() will do no good, but every lock we take is > > potentially dangerous as well. In the simplest case because of $LOCK-debug.c > > files in kernel/locking (spin_lock in our case); in the worst case -- > > because of WARNs issued by log_store() and friends (there may be custom > > modifications) or by violations of spinlock atomicity requirements. > > > > For example, > > > > vprintk_emit() > > local_irq_save() > > raw_spin_lock() > > text_len = vscnprintf(text, sizeof(textbuf), fmt, args) > > { > > vsnprintf() > > { > > if (WARN_ON_ONCE(size > INT_MAX)) > > return 0; > > } > > } > > ... > > > > this is a rather unlikely event, sure, there must be some sort of > > memory corruption or something else, but the thing is -- if it will > > happen, printk() will not be willing to help. > > > > wake_up_process() change, posted earlier, is using a deferred version of > > WARN macro, but we definitely can (and we better do) switch to lockless > > alternative printk() in both cases and don't bother with new macros. > > replacing all of the existing ones with 'safe' deferred versions is > > a difficult task, but keeping track of a newly introduced ones is even > > harder (if possible at all). > > I see. It makes some sense. I would like to be on the safe side. I am > just afraid that adding yet another per-CPU buffer is too complex. > It adds quite some complexity to the code. And it even more scatters > the messages so that it will be harder to get them from the > crash dump or flush them to the console when the system goes down. > > It took few years to get in the solution for NMIs even when > it fixed real life deadlocks for many people and customers. > I am afraid that it is not realistic to get in similar complex > code to fix rather theoretical problems. well, I still can try it in my spare time. we can't fix printk() without ever touching it, can we? so far we basically only acknowledge the existing printk() problems. we can do better than that, I think. > Sigh, I waited few days with this comment. I do not want to sound > like a broken record. no worries -ss ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v10 1/2] printk: Make printk() completely async 2016-08-31 2:31 ` Sergey Senozhatsky @ 2016-08-31 9:38 ` Petr Mladek 2016-08-31 12:52 ` Sergey Senozhatsky 0 siblings, 1 reply; 39+ messages in thread From: Petr Mladek @ 2016-08-31 9:38 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Jan Kara, Sergey Senozhatsky, Viresh Kumar, Andrew Morton, Jan Kara, Tejun Heo, Tetsuo Handa, linux-kernel, Byungchul Park, vlevenetz, Greg Kroah-Hartman On Wed 2016-08-31 11:31:35, Sergey Senozhatsky wrote: > On (08/30/16 11:29), Petr Mladek wrote: > > > you didn't miss anything, I think I wasn't too descriptive and that caused > > > some confusion. this patch is not a replacement of wake_up_process() patch > > > posted earlier in the loop, but an addition to it. not only every WARN/BUG > > > issued from wake_up_process() will do no good, but every lock we take is > > > potentially dangerous as well. In the simplest case because of $LOCK-debug.c > > > files in kernel/locking (spin_lock in our case); in the worst case -- > > > because of WARNs issued by log_store() and friends (there may be custom > > > modifications) or by violations of spinlock atomicity requirements. > > > > > > For example, > > > > > > vprintk_emit() > > > local_irq_save() > > > raw_spin_lock() > > > text_len = vscnprintf(text, sizeof(textbuf), fmt, args) > > > { > > > vsnprintf() > > > { > > > if (WARN_ON_ONCE(size > INT_MAX)) > > > return 0; > > > } > > > } > > > ... > > > > > > this is a rather unlikely event, sure, there must be some sort of > > > memory corruption or something else, but the thing is -- if it will > > > happen, printk() will not be willing to help. > > > > > > wake_up_process() change, posted earlier, is using a deferred version of > > > WARN macro, but we definitely can (and we better do) switch to lockless > > > alternative printk() in both cases and don't bother with new macros. > > > replacing all of the existing ones with 'safe' deferred versions is > > > a difficult task, but keeping track of a newly introduced ones is even > > > harder (if possible at all). > > > > I see. It makes some sense. I would like to be on the safe side. I am > > just afraid that adding yet another per-CPU buffer is too complex. > > It adds quite some complexity to the code. And it even more scatters > > the messages so that it will be harder to get them from the > > crash dump or flush them to the console when the system goes down. > > > > It took few years to get in the solution for NMIs even when > > it fixed real life deadlocks for many people and customers. > > I am afraid that it is not realistic to get in similar complex > > code to fix rather theoretical problems. > > well, I still can try it in my spare time. we can't fix printk() without > ever touching it, can we? so far we basically only acknowledge the > existing printk() problems. we can do better than that, I think. Ah, I do not want to discourage you from finding a solution for these problems. I just wanted to point out problems with this particular path of thinking (more per-CPU buffers, shuffling data between them and the main buffer and console). But I might be wrong. Sigh, there are many problems with printk(). I think the we recently discussed the following problems: 1. Hung task or blocked irq handler when preemption/irqs are disabled and there are too many messages pushed to the console. 2. Potential deadlocks when calling wake_up_process() by async printk and console_unlock(). 3. Clean up the console handling to split manipulation with consoles setting and pushing the messages. By other words, allow to push the console messages only when wanted. 4. Messed output with continuous lines. They are related but only partly. IMHO, it is not realistic to fix all the problems in a single patchset. I wonder how to move forward. Our primary target was to solve the 1st problem with the async printk. It has stalled because we hit the other areas. Let's look at them from this point of view. Ad 2. The potential deadlock with wake_up_process(). It pooped up with using async printk during the suspend. But it is not new! up() called by console_unlock() has the same problem. I thought that it was different because console_trylock() would prevent recursion but I was wrong. There seems to be similar deadlock: console_unlock() up_console_sem() up() __up() raw_spin_lock_irqsave(&sem->lock, flags); wake_up_process() WARN() printk() vprintk_emit() console_trylock() down_trylock_console_sem() __down_trylock_console_sem)() down_trylock() raw_spin_lock_irqsave(&sem->lock, flags); DEADLOCK: because sem->lock is already taken by __up() We hit the deadlock quickly when the printk kthread used SCHED_FIFO. But it must be almost impossible to hit it (hit a WARN() with the normal scheduler. Otherwise, people would hit it also by console_unlock(). => If we use normal scheduler for the printk() thread, we should be on the safe side. The deadlock should get fixed but it will be enough to fix it later separately. Ad 3. The clean up of the console code might change some things. But it it is a huge task with unclear output. Anyway, we want to push messages to the console from printk(), so it will not help to avoid the async printk. It is a completely separate problem from my point of view. Ad 4. The problems with messaged continues lines got a bit more visible with the async printk. It is because a partially flushed cont buffer is blocked until the rest of the line is flushed. This happens more likely when the messages are flushed by a separate process. I have proposed a fix that read the end of the partially printed line from the main ring buffer. It made the code easier and might be a low risk change. In each case, I would not mix it with a perfect solution of the continuous lines. IMHO, it would be too complex and it is not worth it. Did I miss anything? I wonder how to separate the problems and make them more manageable. Best Regards, Petr ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v10 1/2] printk: Make printk() completely async 2016-08-31 9:38 ` Petr Mladek @ 2016-08-31 12:52 ` Sergey Senozhatsky 2016-09-01 8:58 ` Petr Mladek 0 siblings, 1 reply; 39+ messages in thread From: Sergey Senozhatsky @ 2016-08-31 12:52 UTC (permalink / raw) To: Petr Mladek Cc: Sergey Senozhatsky, Jan Kara, Sergey Senozhatsky, Viresh Kumar, Andrew Morton, Jan Kara, Tejun Heo, Tetsuo Handa, linux-kernel, Byungchul Park, vlevenetz, Greg Kroah-Hartman On (08/31/16 11:38), Petr Mladek wrote: [..] > Ah, I do not want to discourage you from finding a solution for these > problems. oh, Petr, I didn't mean it. that was a poor/incorrect wording on my side. > I just wanted to point out problems with this particular > path of thinking (more per-CPU buffers, shuffling data between > them and the main buffer and console). sure, and I do appreciate your input. that's a good summary. > Sigh, there are many problems with printk(). I think the we recently > discussed the following problems: > > 1. Hung task or blocked irq handler when preemption/irqs > are disabled and there are too many messages pushed to > the console. 1.1. deferred printing, for instance due to sched throttling 1.2. unlucky printk() from IRQ handler, that succeeded in grabbing the console semaphore via console_trylock(). once there is a moderate printk() flood and some of console_unlock() calls can be executed with preemption disabled, chances of sched throttling do increase. I've seen this many times in the past. > 2. Potential deadlocks when calling wake_up_process() by > async printk and console_unlock(). * there are many reasons to those recursive printk() calls -- some can be addressed, some cannot. for instance, it doesn't matter how many per-CPU buffers we use for alternative printk() once the logbuf_lock is corrupted. another `deadlock' example would be: SyS_ioctl do_vfs_ioctl tty_ioctl n_tty_ioctl tty_mode_ioctl set_termios tty_set_termios uart_set_termios uart_change_speed FOO_serial_set_termios spin_lock_irqsave(&port->lock) // lock the output port .... !! WARN() or pr_err() or printk() vprintk_emit() /* console_trylock() */ console_unlock() call_console_drivers() FOO_write() spin_lock_irqsave(&port->lock) // already locked > 3. Clean up the console handling to split manipulation with > consoles setting and pushing the messages. By other words, > allow to push the console messages only when wanted. yes, that's a hard one. not only console_unlock() may force a completely random task to do the printing loop, but console_lock() will force that task to stay in TASK_UNINTERRUPTIBLE as long as the semaphore is busy. if that task ends up being a user space process that must reply to systemd watchdog (user space) signals then things can get confusing. apart from the fact that that user space process can spend XX seconds printing kernel's messages from console_unlock(). even a single unfortunate console_trylock() and console_unlock() can lockup the system, etc. basically because they void the printk_kthread. so there are several outcomes from reworking console locking, but the results may vary. depending on particular setups. > 4. Messed output with continuous lines. 5. not 100% guaranteed printing on panic not entirely related to printk(), but to console output mechanism in general. we have console_flush_on_panic() which ignores console semaphore state, to increase our chances of seeing the backtrace. however, there are more that just one lock involved: logbuf_lock, serial driver locks. so we may start zap_locks() in console_flush_on_panic() to re-init the logbuf_lock, but underlying serial driver's locks are still in unclear state. most of the drivers (if not all of them) take the port->lock under disabled IRQs, so if panic-CPU is not the one that holds the port->lock then the port->lock owner CPU will probably unlock the spin_lock before processing its STOP_IPI. if it's the port->lock CPU that panic() the system (nmi_panic() or BUG()) then things can be bad. > Our primary target was to solve the 1st problem with the async printk. > It has stalled because we hit the other areas. Let's look at them > from this point of view. > > Ad 2. The potential deadlock with wake_up_process(). It pooped up > with using async printk during the suspend. right... and no per-CPU buffer could have helped here. > But it is not new! up() called by console_unlock() has the > same problem. I thought that it was different because > console_trylock() would prevent recursion but I was wrong. correct. Byungchul hit that problem awhile ago https://lkml.org/lkml/2016/2/17/102 > => If we use normal scheduler for the printk() thread, we should > be on the safe side. The deadlock should get fixed but it > will be enough to fix it later separately. agree. > Ad 4. The problems with messaged continues lines got a bit more > visible with the async printk. yep. I also agree that it might be not so serious to fix it now (if ever). > Did I miss anything? > > I wonder how to separate the problems and make them more manageable. so I was thinking for a moment about doing the recursion detection rework before the async_printk. just because better recursion detection is a nice thing to have in the first place and it probably may help us catching some of the surprises that async_printk might have. but it probably will be more problematic than I thought. then async_printk. I have a refreshed series on my hands, addressing Viresh's reports. it certainly makes things better, but it doesn't eliminate all of the lockups/etc sources. a console_unlock() doing wake_up_process(printk_kthread) would make it better. -ss ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v10 1/2] printk: Make printk() completely async 2016-08-31 12:52 ` Sergey Senozhatsky @ 2016-09-01 8:58 ` Petr Mladek 2016-09-02 7:58 ` Sergey Senozhatsky 0 siblings, 1 reply; 39+ messages in thread From: Petr Mladek @ 2016-09-01 8:58 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Sergey Senozhatsky, Jan Kara, Viresh Kumar, Andrew Morton, Jan Kara, Tejun Heo, Tetsuo Handa, linux-kernel, Byungchul Park, vlevenetz, Greg Kroah-Hartman On Wed 2016-08-31 21:52:24, Sergey Senozhatsky wrote: > On (08/31/16 11:38), Petr Mladek wrote: > > 2. Potential deadlocks when calling wake_up_process() by > > async printk and console_unlock(). > > * there are many reasons to those recursive printk() calls -- some > can be addressed, some cannot. for instance, it doesn't matter how many > per-CPU buffers we use for alternative printk() once the logbuf_lock is > corrupted. Yup and BTW: Peter Zijlstra wants to avoid zapping locks whenever possible because it corrupts the state. It might solve the actual state but it might cause deadlock by the double unlock. > another `deadlock' example would be: > > SyS_ioctl > do_vfs_ioctl > tty_ioctl > n_tty_ioctl > tty_mode_ioctl > set_termios > tty_set_termios > uart_set_termios > uart_change_speed > FOO_serial_set_termios > spin_lock_irqsave(&port->lock) // lock the output port > .... > !! WARN() or pr_err() or printk() > vprintk_emit() > /* console_trylock() */ > console_unlock() > call_console_drivers() > FOO_write() > spin_lock_irqsave(&port->lock) // already > locked Great catch! From the already mentioned solutions, I would prefer using deferred variants of WARN()/BUG()/printk() on these locations. Together with using lockdep to find these locations. Also there is the Peter Zijlstra's idea of using a lockless "early" console to debug the situations where it happens. It might make sense to make such a console easy to use. I am unable to find any other generic solution that would prevent this from the printk() side at the moment. > 5. not 100% guaranteed printing on panic > not entirely related to printk(), but to console output mechanism in > general. we have console_flush_on_panic() which ignores console semaphore > state, to increase our chances of seeing the backtrace. however, there are > more that just one lock involved: logbuf_lock, serial driver locks. so we may > start zap_locks() in console_flush_on_panic() to re-init the logbuf_lock, > but underlying serial driver's locks are still in unclear state. most of > the drivers (if not all of them) take the port->lock under disabled IRQs, > so if panic-CPU is not the one that holds the port->lock then the port->lock > owner CPU will probably unlock the spin_lock before processing its STOP_IPI. > if it's the port->lock CPU that panic() the system (nmi_panic() or BUG()) > then things can be bad. That might be very hard to solve in general as well. Again the PeterZ's idea with the lockless console might help here. > > I wonder how to separate the problems and make them more manageable. > > so I was thinking for a moment about doing the recursion detection rework > before the async_printk. just because better recursion detection is a nice > thing to have in the first place and it probably may help us catching some > of the surprises that async_printk might have. but it probably will be more > problematic than I thought. > > then async_printk. I have a refreshed series on my hands, addressing > Viresh's reports. it certainly makes things better, but it doesn't > eliminate all of the lockups/etc sources. We must separate historical possible lockups and new regressions. Only regressions should block the async printk series. Old bugs should be fixed separately to keep the series manageable. Anyway, I think that the async printk will make sense even when we solve all the other issues. If async printk does not cause regressions, why not make it in. > a console_unlock() doing > wake_up_process(printk_kthread) would make it better. I am not sure what you mean by this. Thanks for working on it. Best Regards, Petr ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v10 1/2] printk: Make printk() completely async 2016-09-01 8:58 ` Petr Mladek @ 2016-09-02 7:58 ` Sergey Senozhatsky 2016-09-02 15:15 ` Petr Mladek 0 siblings, 1 reply; 39+ messages in thread From: Sergey Senozhatsky @ 2016-09-02 7:58 UTC (permalink / raw) To: Petr Mladek Cc: Sergey Senozhatsky, Sergey Senozhatsky, Jan Kara, Viresh Kumar, Andrew Morton, Jan Kara, Tejun Heo, Tetsuo Handa, linux-kernel, Byungchul Park, vlevenetz, Greg Kroah-Hartman On (09/01/16 10:58), Petr Mladek wrote: > On Wed 2016-08-31 21:52:24, Sergey Senozhatsky wrote: > > On (08/31/16 11:38), Petr Mladek wrote: > > > 2. Potential deadlocks when calling wake_up_process() by > > > async printk and console_unlock(). > > > > * there are many reasons to those recursive printk() calls -- some > > can be addressed, some cannot. for instance, it doesn't matter how many > > per-CPU buffers we use for alternative printk() once the logbuf_lock is > > corrupted. > > Yup and BTW: Peter Zijlstra wants to avoid zapping locks whenever > possible because it corrupts the state. It might solve the actual > state but it might cause deadlock by the double unlock. yes, don't really want to zap_locks() either. [..] > Great catch! From the already mentioned solutions, I would prefer > using deferred variants of WARN()/BUG()/printk() on these locations. > Together with using lockdep to find these locations. hmm... need to think more. one of the problems is that we would have to periodically "scan" for new WARNs/BUGs/etc doing all the types of random .configs > Also there is the Peter Zijlstra's idea of using a lockless > "early" console to debug the situations where it happens. > It might make sense to make such a console easy to use. aha, not really familiar with early console. > I am unable to find any other generic solution that would prevent this > from the printk() side at the moment. > > > 5. not 100% guaranteed printing on panic [..] > That might be very hard to solve in general as well. Again the PeterZ's > idea with the lockless console might help here. "need to google it". > > > I wonder how to separate the problems and make them more manageable. > > > > so I was thinking for a moment about doing the recursion detection rework > > before the async_printk. just because better recursion detection is a nice > > thing to have in the first place and it probably may help us catching some > > of the surprises that async_printk might have. but it probably will be more > > problematic than I thought. > > > > then async_printk. I have a refreshed series on my hands, addressing > > Viresh's reports. it certainly makes things better, but it doesn't > > eliminate all of the lockups/etc sources. > > We must separate historical possible lockups and new regressions. > Only regressions should block the async printk series. Old > bugs should be fixed separately to keep the series manageable. agree. > Anyway, I think that the async printk will make sense even > when we solve all the other issues. If async printk does not > cause regressions, why not make it in. sure. > > a console_unlock() doing > > wake_up_process(printk_kthread) would make it better. > > I am not sure what you mean by this. I meant that this thing local_irq_save() // or preempt_disable() ... if (console_trylock()) console_unlock(); ... local_irq_restore() // or preempt_enable() can easily lockup the system if console_trylock() was successful and there are enough messages to print. printk_kthread can't help, because here we basically enforce the `old' behavior. we have async printk, but not async console output. tweaking console_unlock() to offload the actual printing loop to printk_kthread would make the entire console output async: static void console_sync_flush_and_unlock(void) { for (;;) { ... call_console_drivers(); ... } } void console_unlock(void) { if (!MOTORMOUTH && can_printk_async()) { up(); wake_up_process(printk_kthread); return; } console_sync_flush_and_unlock(); } -ss ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v10 1/2] printk: Make printk() completely async 2016-09-02 7:58 ` Sergey Senozhatsky @ 2016-09-02 15:15 ` Petr Mladek 2016-09-06 7:16 ` Sergey Senozhatsky 0 siblings, 1 reply; 39+ messages in thread From: Petr Mladek @ 2016-09-02 15:15 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Sergey Senozhatsky, Jan Kara, Viresh Kumar, Andrew Morton, Jan Kara, Tejun Heo, Tetsuo Handa, linux-kernel, Byungchul Park, vlevenetz, Greg Kroah-Hartman On Fri 2016-09-02 16:58:08, Sergey Senozhatsky wrote: > On (09/01/16 10:58), Petr Mladek wrote: > > On Wed 2016-08-31 21:52:24, Sergey Senozhatsky wrote: > > > a console_unlock() doing > > > wake_up_process(printk_kthread) would make it better. > > > > I am not sure what you mean by this. > > I meant that this thing > > local_irq_save() // or preempt_disable() > ... > if (console_trylock()) > console_unlock(); > ... > local_irq_restore() // or preempt_enable() I see. > can easily lockup the system if console_trylock() was successful and there > are enough messages to print. printk_kthread can't help, because here we > basically enforce the `old' behavior. we have async printk, but not async > console output. tweaking console_unlock() to offload the actual printing loop > to printk_kthread would make the entire console output async: > > static void console_sync_flush_and_unlock(void) > { > for (;;) { > ... > call_console_drivers(); > ... > } > } > > void console_unlock(void) > { > if (!MOTORMOUTH && can_printk_async()) { > up(); > wake_up_process(printk_kthread); > return; > } > console_sync_flush_and_unlock(); > } Something like this would make sense. But I would do it in a separate patch(set). We need to go through all console_unlock() callers and make sure that they are fine with the potential async behavior. I would not complicate the async printk patchset by this. Best Regards, Petr ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v10 1/2] printk: Make printk() completely async 2016-09-02 15:15 ` Petr Mladek @ 2016-09-06 7:16 ` Sergey Senozhatsky 0 siblings, 0 replies; 39+ messages in thread From: Sergey Senozhatsky @ 2016-09-06 7:16 UTC (permalink / raw) To: Petr Mladek Cc: Sergey Senozhatsky, Sergey Senozhatsky, Jan Kara, Viresh Kumar, Andrew Morton, Jan Kara, Tejun Heo, Tetsuo Handa, linux-kernel, Byungchul Park, vlevenetz, Greg Kroah-Hartman Hello, On (09/02/16 17:15), Petr Mladek wrote: [..] > > I meant that this thing > > > > local_irq_save() // or preempt_disable() > > ... > > if (console_trylock()) > > console_unlock(); > > ... > > local_irq_restore() // or preempt_enable() > > I see. > > > can easily lockup the system if console_trylock() was successful and there > > are enough messages to print. printk_kthread can't help, because here we > > basically enforce the `old' behavior. we have async printk, but not async > > console output. tweaking console_unlock() to offload the actual printing loop > > to printk_kthread would make the entire console output async: > > > > static void console_sync_flush_and_unlock(void) > > { > > for (;;) { > > ... > > call_console_drivers(); > > ... > > } > > } > > > > void console_unlock(void) > > { > > if (!MOTORMOUTH && can_printk_async()) { > > up(); > > wake_up_process(printk_kthread); > > return; > > } > > console_sync_flush_and_unlock(); > > } > > Something like this would make sense. But I would do it in a separate > patch(set). We need to go through all console_unlock() callers and > make sure that they are fine with the potential async behavior. > I would not complicate the async printk patchset by this. sure. just added one more item to the list. -ss ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v10 1/2] printk: Make printk() completely async 2016-08-20 5:24 ` Sergey Senozhatsky 2016-08-22 4:15 ` Sergey Senozhatsky @ 2016-08-23 13:03 ` Petr Mladek 1 sibling, 0 replies; 39+ messages in thread From: Petr Mladek @ 2016-08-23 13:03 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Jan Kara, Sergey Senozhatsky, Viresh Kumar, Andrew Morton, Jan Kara, Tejun Heo, Tetsuo Handa, linux-kernel, Byungchul Park, vlevenetz, Greg Kroah-Hartman On Sat 2016-08-20 14:24:30, Sergey Senozhatsky wrote: > On (08/19/16 21:00), Jan Kara wrote: > > > > depending on .config BUG() may never return back -- passing control > > > > to do_exit(), so printk_deferred_exit() won't be executed. thus we > > > > probably need to have a per-cpu variable that would indicate that > > > > we are in deferred_bug. hm... but do we really need deferred BUG() > > > > in the first place? > > > > > > Good question. I am not aware of any BUG_ON() that would be called from > > > wake_up_process() but it is hard to check everything. > > > > > > A conservative approach would be to force synchronous printk from > > > BUG_ON(). > > > > Just a quick thought: Cannot we just do printk_deferred_enter() when we are > > about to call into the scheduler from printk code and printk_deferred_exit() > > when leaving it? That would look like the least error-prone way how > > handling this kind of recursion... > > interesting idea. > printk_deferred_enter() increments preempt count, so there may be additional > obstacles and, as a result, ad-hocs, that scheduler people will sincerely hate. > need to think more. I wonder if this would be acceptable at least for wake_up_process(). It seems to be the only scheduler function that we are interested in. And we might call it from vprintk > > OTOH there's also the other possible direction for the recursion when we > > are in the scheduler, holding some scheduler locks, decide to WARN which > > enters printk, that ends up calling wake_up_process() which deadlocks > > on scheduler locks... I don't see how to handle this type of recursion > > inside the printk code itself easily and so far the answer was - use > > printk_deferred() in the scheduler and don't use WARN... > > the recursion detection is really tricky, yes. it seems (and I haven't > thought of it good enough) to be a bit simpler when we operate in async > printk mode, because we remove this uncontrollable console_unlock(). > so we can do something like this: > > vprintk_emit(....) > { > local_irq_save(); > > if (this_cpu_read(in_printk)) { > log_store(BUG: printk recursion!"); > goto out; > } This does not quarantee that we have the logbug_lock. We might endup here from the raw_spin_lock() call and the lock might be owned by another CPU. I am afraid that we could only set some global variable here. > > this_cpu_write(in_printk) = 1; > > raw_spin_lock(&logbuf_lock); > log_store(); > raw_spin_unlock(&logbuf_lock); > > if (!in_sched) { > if (console_loglevel != CONSOLE_LOGLEVEL_MOTORMOUTH && > can_printk_async()) { > printk_kthread_need_flush_console = true; > wake_up_process(printk_kthread); > } > } > > this_cpu_write(in_printk) = 0; > out: > local_irq_restore(); > } > > async printk mode from this point of view is sort of atomic. This would prevent using printk_deferred() from the scheduler code. A solution would be to set the per-CPU variable only around the wake_up_process() call. Well, it is orthogonal to using printk_deferred_enter() around calling wake_up_process(). Best Regards, Petr ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v10 1/2] printk: Make printk() completely async 2016-08-19 19:00 ` Jan Kara 2016-08-20 5:24 ` Sergey Senozhatsky @ 2016-08-23 13:48 ` Petr Mladek 1 sibling, 0 replies; 39+ messages in thread From: Petr Mladek @ 2016-08-23 13:48 UTC (permalink / raw) To: Jan Kara Cc: Sergey Senozhatsky, Viresh Kumar, Andrew Morton, Sergey Senozhatsky, Jan Kara, Tejun Heo, Tetsuo Handa, linux-kernel, Byungchul Park, vlevenetz, Greg Kroah-Hartman On Fri 2016-08-19 21:00:07, Jan Kara wrote: > On Fri 19-08-16 11:54:55, Petr Mladek wrote: > > On Fri 2016-08-19 15:32:36, Sergey Senozhatsky wrote: > > > On (08/18/16 12:56), Petr Mladek wrote: > > > > The advantage of the printk_func trick is that it is transparent. > > > > You do not need to modify any existing functions used by WARN()/BUG() > > > > macros. > > > > > > good point. > > > > > > so something like below, perhaps. I'm less sure about > > > deferred BUG()/BUG_ON(): > > > > > > #define DEFERRED_BUG() do { \ > > > printk_deferred_enter(); \ > > > BUG(); \ > > > printk_deferred_exit(); \ > > > } while (0) \ > > > > > > #define DEFERRED_BUG_ON(condition) do { \ > > > printk_deferred_enter(); \ > > > BUG_ON(condition); \ > > > printk_deferred_exit(); \ > > > } while (0) > > > > > > depending on .config BUG() may never return back -- passing control > > > to do_exit(), so printk_deferred_exit() won't be executed. thus we > > > probably need to have a per-cpu variable that would indicate that > > > we are in deferred_bug. hm... but do we really need deferred BUG() > > > in the first place? > > > > Good question. I am not aware of any BUG_ON() that would be called from > > wake_up_process() but it is hard to check everything. > > > > A conservative approach would be to force synchronous printk from > > BUG_ON(). > > Just a quick thought: Cannot we just do printk_deferred_enter() when we are > about to call into the scheduler from printk code and printk_deferred_exit() > when leaving it? That would look like the least error-prone way how > handling this kind of recursion... > > OTOH there's also the other possible direction for the recursion when we > are in the scheduler, holding some scheduler locks, decide to WARN which > enters printk, that ends up calling wake_up_process() which deadlocks > on scheduler locks... I don't see how to handle this type of recursion > inside the printk code itself easily and so far the answer was - use > printk_deferred() in the scheduler and don't use WARN... > > Hum, maybe we could add lockdep annotation to a WARN_ON and BUG_ON macros so > that it would grab and release console_sem (even if the condition is false). > That way we'd get lockdep splats for all the possible WARN_ON and BUG_ON > calls that could deadlock. The idea is interesting but I think that we do not want the fake grab/release of the console_sem. We use console_trylock() in vprintk_emit(). Please note the "try" variant. So it is safe to call a nested printk() from the console code. IMHO, we want to avoid calling console from the scheduler code because: 1. console is slow and we do not want to block the scheduler. 2. console_unlock() calls wake_up_process() and we do not want a deadlock by the scheduler locks. Therefore I think that we want to detect something specific from the scheduler that is also reachable from WARN()/printk(). Best Regards, Petr PS: My brain rotated several times this day around these problems. I hope that my opinion still makes some sense :-) ^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH v10 2/2] printk: Make wake_up_klogd_work_func() async 2016-04-04 16:57 [PATCH v10 0/2] printk: Make printk() completely async Sergey Senozhatsky 2016-04-04 16:57 ` [PATCH v10 1/2] " Sergey Senozhatsky @ 2016-04-04 16:57 ` Sergey Senozhatsky 1 sibling, 0 replies; 39+ messages in thread From: Sergey Senozhatsky @ 2016-04-04 16:57 UTC (permalink / raw) To: Andrew Morton Cc: Jan Kara, Petr Mladek, Tejun Heo, Tetsuo Handa, linux-kernel, Byungchul Park, Sergey Senozhatsky, Sergey Senozhatsky, Jan Kara From: Jan Kara <jack@suse.cz> Offload printing of scheduler deferred messages from IRQ context to a schedulable printing kthread, when possible (the same way we do it in vprintk_emit()). Otherwise, the CPU can spend unbounded amount of time doing printing in console_unlock() from IRQ. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Reviewed-by: Petr Mladek <pmladek@suse.com> --- kernel/printk/printk.c | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index f63dfe4..c364f18 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -2808,9 +2808,16 @@ static void wake_up_klogd_work_func(struct irq_work *irq_work) int pending = __this_cpu_xchg(printk_pending, 0); if (pending & PRINTK_PENDING_OUTPUT) { - /* If trylock fails, someone else is doing the printing */ - if (console_trylock()) - console_unlock(); + if (printk_kthread) { + wake_up_process(printk_kthread); + } else { + /* + * If trylock fails, someone else is doing + * the printing + */ + if (console_trylock()) + console_unlock(); + } } if (pending & PRINTK_PENDING_WAKEUP) -- 2.8.0 ^ permalink raw reply related [flat|nested] 39+ messages in thread
end of thread, other threads:[~2016-09-06 7:16 UTC | newest] Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-04-04 16:57 [PATCH v10 0/2] printk: Make printk() completely async Sergey Senozhatsky 2016-04-04 16:57 ` [PATCH v10 1/2] " Sergey Senozhatsky 2016-04-04 22:51 ` Andrew Morton 2016-04-05 5:17 ` Sergey Senozhatsky 2016-04-05 7:39 ` Sergey Senozhatsky 2016-04-06 0:19 ` Sergey Senozhatsky 2016-04-06 8:27 ` Jan Kara 2016-04-07 9:48 ` Sergey Senozhatsky 2016-04-07 12:08 ` Sergey Senozhatsky 2016-04-07 13:15 ` Jan Kara 2016-08-10 21:17 ` Viresh Kumar 2016-08-12 9:44 ` Petr Mladek 2016-08-15 14:26 ` Vladislav Levenetz 2016-08-16 9:04 ` Petr Mladek 2016-08-18 2:27 ` Sergey Senozhatsky 2016-08-18 9:33 ` Petr Mladek 2016-08-18 9:51 ` Sergey Senozhatsky 2016-08-18 10:56 ` Petr Mladek 2016-08-19 6:32 ` Sergey Senozhatsky 2016-08-19 9:54 ` Petr Mladek 2016-08-19 19:00 ` Jan Kara 2016-08-20 5:24 ` Sergey Senozhatsky 2016-08-22 4:15 ` Sergey Senozhatsky 2016-08-23 12:19 ` Petr Mladek 2016-08-24 1:33 ` Sergey Senozhatsky 2016-08-25 21:10 ` Petr Mladek 2016-08-26 1:56 ` Sergey Senozhatsky 2016-08-26 8:20 ` Sergey Senozhatsky 2016-08-30 9:29 ` Petr Mladek 2016-08-31 2:31 ` Sergey Senozhatsky 2016-08-31 9:38 ` Petr Mladek 2016-08-31 12:52 ` Sergey Senozhatsky 2016-09-01 8:58 ` Petr Mladek 2016-09-02 7:58 ` Sergey Senozhatsky 2016-09-02 15:15 ` Petr Mladek 2016-09-06 7:16 ` Sergey Senozhatsky 2016-08-23 13:03 ` Petr Mladek 2016-08-23 13:48 ` Petr Mladek 2016-04-04 16:57 ` [PATCH v10 2/2] printk: Make wake_up_klogd_work_func() async Sergey Senozhatsky
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).