* [RFC][PATCH v2 0/2] printk: Make printk() completely async @ 2016-03-05 10:55 Sergey Senozhatsky 2016-03-05 10:55 ` [RFC][PATCH v2 1/2] " Sergey Senozhatsky 2016-03-05 10:55 ` [RFC][PATCH v2 2/2] printk: Skip messages on oops Sergey Senozhatsky 0 siblings, 2 replies; 28+ messages in thread From: Sergey Senozhatsky @ 2016-03-05 10:55 UTC (permalink / raw) To: Andrew Morton Cc: Jan Kara, Petr Mladek, Tejun Heo, Tetsuo Handa, linux-kernel, Sergey Senozhatsky, Sergey Senozhatsky Hello, RFC The patch set is a slightly updated Jan Kara's patches. The majority of the changes are in 0001 (described below in details). I've performed some tests and haven't spotted any problems, I'll certainly do more tests. Any feedback is appreciated. I had additional patches on top of the series, but I decided not to post them. I'm unhappy with console semaphore, this must be a separate mail loop and a patch set, some day. The patch set is against next-20160304 with reverted NMI patches. v2 (Sergey): - use dedicated printk workqueue with WQ_MEM_RECLAIM bit - fallback to system-wide workqueue only if allocation of printk_wq has failed - do not use system_wq as a fallback wq. both console_lock() and onsole_unlock() can spend a significant amount of time; so we need to use system_long_wq. - rework sync/!sync detection logic a) we can have deferred (in_sched) messages before we allocate printk_wq, so the only way to handle those messages is via IRQ context b) even in printk.synchronous mode, deferred messages must not be printed directly, and should go via IRQ context c) even if we allocated printk_wq and have !sync_printk mode, we must route deferred messages via IRQ context - so this adds additional bool flags to vprint_emit() and introduces a new pending bit to `printk_pending' - fix build on !PRINTK configs Jan Kara (2): printk: Make printk() completely async printk: Skip messages on oops Documentation/kernel-parameters.txt | 10 ++ kernel/printk/printk.c | 230 +++++++++++++++++++++++++++--------- 2 files changed, 181 insertions(+), 59 deletions(-) -- 2.8.0.rc0 ^ permalink raw reply [flat|nested] 28+ messages in thread
* [RFC][PATCH v2 1/2] printk: Make printk() completely async 2016-03-05 10:55 [RFC][PATCH v2 0/2] printk: Make printk() completely async Sergey Senozhatsky @ 2016-03-05 10:55 ` Sergey Senozhatsky 2016-03-06 6:32 ` Sergey Senozhatsky 2016-03-05 10:55 ` [RFC][PATCH v2 2/2] printk: Skip messages on oops Sergey Senozhatsky 1 sibling, 1 reply; 28+ messages in thread From: Sergey Senozhatsky @ 2016-03-05 10:55 UTC (permalink / raw) To: Andrew Morton Cc: Jan Kara, Petr Mladek, Tejun Heo, Tetsuo Handa, linux-kernel, Sergey Senozhatsky, Sergey Senozhatsky, Jan Kara From: Jan Kara <jack@suse.cz> Currently, printk() sometimes waits for message to be printed to console and sometimes it does not (when console_sem is held by some other process). In case printk() grabs console_sem and starts printing to console, it prints messages from kernel printk buffer until the buffer is empty. When serial console is attached, printing is slow and thus other CPUs in the system have plenty of time to append new messages to the buffer while one CPU is printing. Thus the CPU can spend unbounded amount of time doing printing in console_unlock(). This is especially serious problem if the printk() calling console_unlock() was called with interrupts disabled. In practice users have observed a CPU can spend tens of seconds printing in console_unlock() (usually during boot when hundreds of SCSI devices are discovered) resulting in RCU stalls (CPU doing printing doesn't reach quiescent state for a long time), softlockup reports (IPIs for the printing CPU don't get served and thus other CPUs are spinning waiting for the printing CPU to process IPIs), and eventually a machine death (as messages from stalls and lockups append to printk buffer faster than we are able to print). So these machines are unable to boot with serial console attached. Another observed issue is that due to slow printk, hardware discovery is slow and udev times out before kernel manages to discover all the attached HW. Also during artificial stress testing SATA disk disappears from the system because its interrupts aren't served for too long. This patch makes printk() completely asynchronous (similar to what printk_deferred() did until now). It appends message to the kernel printk buffer and queues work to do the printing to console. This has the advantage that printing always happens from a schedulable contex and thus we don't lockup any particular CPU or even interrupts. Also it has the advantage that printk() is fast and thus kernel booting is not slowed down by slow serial console. Disadvantage of this method is that in case of crash there is higher chance that important messages won't appear in console output (we may need working scheduling to print message to console). We somewhat mitigate this risk by switching printk to the original method of immediate printing to console if oops is in progress. Also for debugging purposes we provide printk.synchronous kernel parameter which resorts to the original printk behavior. printk() is expected to work under different conditions and in different scenarios, including corner cases of OOM when all of the workers are busy (e.g. allocating memory). Thus by default printk() uses its own dedicated workqueue with WQ_MEM_RECLAIM bit set. It falls back to system_long_wq (console_unlock() is time unbound) only if it has failed to allocate printk_wq. Another thing to mention, is that deferred printk() messages may appear before printk_wq is allocated, so in the very beginning we have to printk deferred messages the old way -- in IRQ context. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> --- Documentation/kernel-parameters.txt | 10 ++ kernel/printk/printk.c | 194 +++++++++++++++++++++++++----------- 2 files changed, 146 insertions(+), 58 deletions(-) diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index e0a21e4..5b01941 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -3108,6 +3108,16 @@ bytes respectively. Such letter suffixes can also be entirely omitted. printk.time= Show timing data prefixed to each printk message line Format: <bool> (1/Y/y=enable, 0/N/n=disable) + printk.synchronous= + By default kernel messages are printed to console + asynchronously (except during early boot or when oops + is happening). That avoids kernel stalling behind slow + serial console and thus avoids softlockups, interrupt + timeouts, or userspace timing out during heavy printing. + However for debugging problems, printing messages to + console immediately may be desirable. This option + enables such behavior. + processor.max_cstate= [HW,ACPI] Limit processor to maximum C-state max_cstate=9 overrides any DMI blacklist limit. diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index d5fd844..843f3fd 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -46,6 +46,7 @@ #include <linux/utsname.h> #include <linux/ctype.h> #include <linux/uio.h> +#include <linux/workqueue.h> #include <asm/uaccess.h> #include <asm-generic/sections.h> @@ -284,6 +285,108 @@ static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN); static char *log_buf = __log_buf; static u32 log_buf_len = __LOG_BUF_LEN; +/* + * When true, printing to console will happen synchronously unless someone else + * is already printing messages. + */ +static bool __read_mostly printk_sync; +module_param_named(synchronous, printk_sync, bool, S_IRUGO | S_IWUSR); +MODULE_PARM_DESC(synchronous, "make printing to console synchronous"); + +static struct workqueue_struct *printk_wq; + +#define PRINTK_PENDING_WAKEUP (1<<0) +#define PRINTK_PENDING_IRQ_OUTPUT (1<<1) +#define PRINTK_PENDING_WQ_OUTPUT (1<<2) + +static DEFINE_PER_CPU(int, printk_pending); + +static int __init init_printk_workqueue(void) +{ + if (printk_sync) + return 0; + + printk_wq = alloc_workqueue("printk_wq", WQ_MEM_RECLAIM, 0); + /* + * Fallback to one of system-wide workqueues if we failed to + * allocate a dedicated one. console_unlock() may take some + * time to flush all of the messages, so we better use + * system_long_wq. + */ + if (!printk_wq) { + pr_err("printk: Fallback to system_long workqueue\n"); + printk_wq = system_long_wq; + } + + return 0; +} +early_initcall(init_printk_workqueue); + +static void printing_work_func(struct work_struct *work) +{ + console_lock(); + console_unlock(); +} + +static DECLARE_WORK(printing_work, printing_work_func); + +static void wake_up_klogd_work_func(struct irq_work *irq_work) +{ + int pending = __this_cpu_xchg(printk_pending, 0); + + /* + * We just schedule regular work to do the printing from irq work. We + * don't want to do printing here directly as that happens with + * interrupts disabled and thus is bad for interrupt latency. We also + * don't want to queue regular work from vprintk_emit() as that gets + * called in various difficult contexts where schedule_work() could + * deadlock. + */ + if (pending & PRINTK_PENDING_WQ_OUTPUT) + queue_work(printk_wq, &printing_work); + + /* + * Deferred printk messages may come too early when we don't have + * workqueue allocated yet. We can't print deferred messages directly, + * because this may deadlock, so in the very beginning we have to + * print deferred messages from IRQ context. + */ + if (pending & PRINTK_PENDING_IRQ_OUTPUT) { + if (console_trylock()) + console_unlock(); + } + + if (pending & PRINTK_PENDING_WAKEUP) + wake_up_interruptible(&log_wait); +} + +static DEFINE_PER_CPU(struct irq_work, wake_up_klogd_work) = { + .func = wake_up_klogd_work_func, + .flags = IRQ_WORK_LAZY, +}; + +void wake_up_klogd(void) +{ + preempt_disable(); + if (waitqueue_active(&log_wait)) { + this_cpu_or(printk_pending, PRINTK_PENDING_WAKEUP); + irq_work_queue(this_cpu_ptr(&wake_up_klogd_work)); + } + preempt_enable(); +} + +int printk_deferred(const char *fmt, ...) +{ + va_list args; + int r; + + va_start(args, fmt); + r = vprintk_emit(0, LOGLEVEL_SCHED, NULL, 0, fmt, args); + va_end(args); + + return r; +} + /* Return log buffer address */ char *log_buf_addr_get(void) { @@ -1609,6 +1712,8 @@ asmlinkage int vprintk_emit(int facility, int level, const char *dict, size_t dictlen, const char *fmt, va_list args) { + /* cpu currently holding logbuf_lock in this function */ + static unsigned int logbuf_cpu = UINT_MAX; static bool recursion_bug; static char textbuf[LOG_LINE_MAX]; char *text = textbuf; @@ -1618,12 +1723,13 @@ asmlinkage int vprintk_emit(int facility, int level, int this_cpu; int printed_len = 0; bool in_sched = false; - /* cpu currently holding logbuf_lock in this function */ - static unsigned int logbuf_cpu = UINT_MAX; + bool in_panic = console_loglevel == CONSOLE_LOGLEVEL_MOTORMOUTH; + bool sync_print = printk_sync; if (level == LOGLEVEL_SCHED) { level = LOGLEVEL_DEFAULT; in_sched = true; + sync_print = false; } boot_delay_msec(level); @@ -1752,10 +1858,36 @@ asmlinkage int vprintk_emit(int facility, int level, logbuf_cpu = UINT_MAX; raw_spin_unlock(&logbuf_lock); lockdep_on(); + /* + * By default we print message to console asynchronously so that kernel + * doesn't get stalled due to slow serial console. That can lead to + * softlockups, lost interrupts, or userspace timing out under heavy + * printing load. + * + * However we resort to synchronous printing of messages during early + * boot, when synchronous printing was explicitly requested by + * kernel parameter, or when console_verbose() was called to print + * everything during panic / oops. + * + * Deferred messages (in_sched), however, must not be printed + * synchronously. + */ + if (!sync_print) { + if (printk_wq && !in_panic) { + __this_cpu_or(printk_pending, + PRINTK_PENDING_WQ_OUTPUT); + irq_work_queue(this_cpu_ptr(&wake_up_klogd_work)); + } else if (in_sched) { + __this_cpu_or(printk_pending, + PRINTK_PENDING_IRQ_OUTPUT); + irq_work_queue(this_cpu_ptr(&wake_up_klogd_work)); + } else { + sync_print = true; + } + } local_irq_restore(flags); - /* If called from the scheduler, we can not call up(). */ - if (!in_sched) { + if (sync_print) { lockdep_off(); /* * Try to acquire and then immediately release the console @@ -2724,60 +2856,6 @@ late_initcall(printk_late_init); #if defined CONFIG_PRINTK /* - * Delayed printk version, for scheduler-internal messages: - */ -#define PRINTK_PENDING_WAKEUP 0x01 -#define PRINTK_PENDING_OUTPUT 0x02 - -static DEFINE_PER_CPU(int, printk_pending); - -static void wake_up_klogd_work_func(struct irq_work *irq_work) -{ - int pending = __this_cpu_xchg(printk_pending, 0); - - if (pending & PRINTK_PENDING_OUTPUT) { - /* If trylock fails, someone else is doing the printing */ - if (console_trylock()) - console_unlock(); - } - - if (pending & PRINTK_PENDING_WAKEUP) - wake_up_interruptible(&log_wait); -} - -static DEFINE_PER_CPU(struct irq_work, wake_up_klogd_work) = { - .func = wake_up_klogd_work_func, - .flags = IRQ_WORK_LAZY, -}; - -void wake_up_klogd(void) -{ - preempt_disable(); - if (waitqueue_active(&log_wait)) { - this_cpu_or(printk_pending, PRINTK_PENDING_WAKEUP); - irq_work_queue(this_cpu_ptr(&wake_up_klogd_work)); - } - preempt_enable(); -} - -int printk_deferred(const char *fmt, ...) -{ - va_list args; - int r; - - preempt_disable(); - va_start(args, fmt); - r = vprintk_emit(0, LOGLEVEL_SCHED, NULL, 0, fmt, args); - va_end(args); - - __this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT); - irq_work_queue(this_cpu_ptr(&wake_up_klogd_work)); - preempt_enable(); - - return r; -} - -/* * printk rate limiting, lifted from the networking subsystem. * * This enforces a rate limit: not more than 10 kernel messages -- 2.8.0.rc0 ^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async 2016-03-05 10:55 ` [RFC][PATCH v2 1/2] " Sergey Senozhatsky @ 2016-03-06 6:32 ` Sergey Senozhatsky 2016-03-06 7:18 ` Tetsuo Handa 0 siblings, 1 reply; 28+ messages in thread From: Sergey Senozhatsky @ 2016-03-06 6:32 UTC (permalink / raw) To: Andrew Morton Cc: Jan Kara, Petr Mladek, Tejun Heo, Tetsuo Handa, linux-kernel, Sergey Senozhatsky, Sergey Senozhatsky, Jan Kara On (03/05/16 19:55), Sergey Senozhatsky wrote: [..] > +static int __init init_printk_workqueue(void) > +{ > + if (printk_sync) > + return 0; > + > + printk_wq = alloc_workqueue("printk_wq", WQ_MEM_RECLAIM, 0); > + /* > + * Fallback to one of system-wide workqueues if we failed to > + * allocate a dedicated one. console_unlock() may take some > + * time to flush all of the messages, so we better use > + * system_long_wq. > + */ > + if (!printk_wq) { > + pr_err("printk: Fallback to system_long workqueue\n"); > + printk_wq = system_long_wq; > + } > + > + return 0; > +} > +early_initcall(init_printk_workqueue); ^^^^^^ better be late_initcall(). init_workqueues(), which creates pwq_cache kmem_cache, is early initcall. ===8<===8<===8<==== >From d9cf33958febd863ceb25c59d447aa9500427c30 Mon Sep 17 00:00:00 2001 From: Jan Kara <jack@suse.cz> Subject: [PATCH 1/2] printk: Make printk() completely async Currently, printk() sometimes waits for message to be printed to console and sometimes it does not (when console_sem is held by some other process). In case printk() grabs console_sem and starts printing to console, it prints messages from kernel printk buffer until the buffer is empty. When serial console is attached, printing is slow and thus other CPUs in the system have plenty of time to append new messages to the buffer while one CPU is printing. Thus the CPU can spend unbounded amount of time doing printing in console_unlock(). This is especially serious problem if the printk() calling console_unlock() was called with interrupts disabled. In practice users have observed a CPU can spend tens of seconds printing in console_unlock() (usually during boot when hundreds of SCSI devices are discovered) resulting in RCU stalls (CPU doing printing doesn't reach quiescent state for a long time), softlockup reports (IPIs for the printing CPU don't get served and thus other CPUs are spinning waiting for the printing CPU to process IPIs), and eventually a machine death (as messages from stalls and lockups append to printk buffer faster than we are able to print). So these machines are unable to boot with serial console attached. Another observed issue is that due to slow printk, hardware discovery is slow and udev times out before kernel manages to discover all the attached HW. Also during artificial stress testing SATA disk disappears from the system because its interrupts aren't served for too long. This patch makes printk() completely asynchronous (similar to what printk_deferred() did until now). It appends message to the kernel printk buffer and queues work to do the printing to console. This has the advantage that printing always happens from a schedulable contex and thus we don't lockup any particular CPU or even interrupts. Also it has the advantage that printk() is fast and thus kernel booting is not slowed down by slow serial console. Disadvantage of this method is that in case of crash there is higher chance that important messages won't appear in console output (we may need working scheduling to print message to console). We somewhat mitigate this risk by switching printk to the original method of immediate printing to console if oops is in progress. Also for debugging purposes we provide printk.synchronous kernel parameter which resorts to the original printk behavior. printk() is expected to work under different conditions and in different scenarios, including corner cases of OOM when all of the workers are busy (e.g. allocating memory). Thus by default printk() uses its own dedicated workqueue with WQ_MEM_RECLAIM bit set. It falls back to system_long_wq (console_unlock() is time unbound) only if it has failed to allocate printk_wq. Another thing to mention, is that deferred printk() messages may appear before printk_wq is allocated, so in the very beginning we have to printk deferred messages the old way -- in IRQ context. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> --- Documentation/kernel-parameters.txt | 10 ++ kernel/printk/printk.c | 194 +++++++++++++++++++++++++----------- 2 files changed, 146 insertions(+), 58 deletions(-) diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index e0a21e4..5b01941 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -3108,6 +3108,16 @@ bytes respectively. Such letter suffixes can also be entirely omitted. printk.time= Show timing data prefixed to each printk message line Format: <bool> (1/Y/y=enable, 0/N/n=disable) + printk.synchronous= + By default kernel messages are printed to console + asynchronously (except during early boot or when oops + is happening). That avoids kernel stalling behind slow + serial console and thus avoids softlockups, interrupt + timeouts, or userspace timing out during heavy printing. + However for debugging problems, printing messages to + console immediately may be desirable. This option + enables such behavior. + processor.max_cstate= [HW,ACPI] Limit processor to maximum C-state max_cstate=9 overrides any DMI blacklist limit. diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index d5fd844..d287fd7 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -46,6 +46,7 @@ #include <linux/utsname.h> #include <linux/ctype.h> #include <linux/uio.h> +#include <linux/workqueue.h> #include <asm/uaccess.h> #include <asm-generic/sections.h> @@ -284,6 +285,108 @@ static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN); static char *log_buf = __log_buf; static u32 log_buf_len = __LOG_BUF_LEN; +/* + * When true, printing to console will happen synchronously unless someone else + * is already printing messages. + */ +static bool __read_mostly printk_sync; +module_param_named(synchronous, printk_sync, bool, S_IRUGO | S_IWUSR); +MODULE_PARM_DESC(synchronous, "make printing to console synchronous"); + +static struct workqueue_struct *printk_wq; + +#define PRINTK_PENDING_WAKEUP (1<<0) +#define PRINTK_PENDING_IRQ_OUTPUT (1<<1) +#define PRINTK_PENDING_WQ_OUTPUT (1<<2) + +static DEFINE_PER_CPU(int, printk_pending); + +static int __init init_printk_workqueue(void) +{ + if (printk_sync) + return 0; + + printk_wq = alloc_workqueue("printk_wq", WQ_MEM_RECLAIM, 0); + /* + * Fallback to one of system-wide workqueues if we failed to + * allocate a dedicated one. console_unlock() may take some + * time to flush all of the messages, so we better use + * system_long_wq. + */ + if (!printk_wq) { + pr_err("printk: Fallback to system_long workqueue\n"); + printk_wq = system_long_wq; + } + + return 0; +} +late_initcall(init_printk_workqueue); + +static void printing_work_func(struct work_struct *work) +{ + console_lock(); + console_unlock(); +} + +static DECLARE_WORK(printing_work, printing_work_func); + +static void wake_up_klogd_work_func(struct irq_work *irq_work) +{ + int pending = __this_cpu_xchg(printk_pending, 0); + + /* + * We just schedule regular work to do the printing from irq work. We + * don't want to do printing here directly as that happens with + * interrupts disabled and thus is bad for interrupt latency. We also + * don't want to queue regular work from vprintk_emit() as that gets + * called in various difficult contexts where schedule_work() could + * deadlock. + */ + if (pending & PRINTK_PENDING_WQ_OUTPUT) + queue_work(printk_wq, &printing_work); + + /* + * Deferred printk messages may come too early when we don't have + * workqueue allocated yet. We can't print deferred messages directly, + * because this may deadlock, so in the very beginning we have to + * print deferred messages from IRQ context. + */ + if (pending & PRINTK_PENDING_IRQ_OUTPUT) { + if (console_trylock()) + console_unlock(); + } + + if (pending & PRINTK_PENDING_WAKEUP) + wake_up_interruptible(&log_wait); +} + +static DEFINE_PER_CPU(struct irq_work, wake_up_klogd_work) = { + .func = wake_up_klogd_work_func, + .flags = IRQ_WORK_LAZY, +}; + +void wake_up_klogd(void) +{ + preempt_disable(); + if (waitqueue_active(&log_wait)) { + this_cpu_or(printk_pending, PRINTK_PENDING_WAKEUP); + irq_work_queue(this_cpu_ptr(&wake_up_klogd_work)); + } + preempt_enable(); +} + +int printk_deferred(const char *fmt, ...) +{ + va_list args; + int r; + + va_start(args, fmt); + r = vprintk_emit(0, LOGLEVEL_SCHED, NULL, 0, fmt, args); + va_end(args); + + return r; +} + /* Return log buffer address */ char *log_buf_addr_get(void) { @@ -1609,6 +1712,8 @@ asmlinkage int vprintk_emit(int facility, int level, const char *dict, size_t dictlen, const char *fmt, va_list args) { + /* cpu currently holding logbuf_lock in this function */ + static unsigned int logbuf_cpu = UINT_MAX; static bool recursion_bug; static char textbuf[LOG_LINE_MAX]; char *text = textbuf; @@ -1618,12 +1723,13 @@ asmlinkage int vprintk_emit(int facility, int level, int this_cpu; int printed_len = 0; bool in_sched = false; - /* cpu currently holding logbuf_lock in this function */ - static unsigned int logbuf_cpu = UINT_MAX; + bool in_panic = console_loglevel == CONSOLE_LOGLEVEL_MOTORMOUTH; + bool sync_print = printk_sync; if (level == LOGLEVEL_SCHED) { level = LOGLEVEL_DEFAULT; in_sched = true; + sync_print = false; } boot_delay_msec(level); @@ -1752,10 +1858,36 @@ asmlinkage int vprintk_emit(int facility, int level, logbuf_cpu = UINT_MAX; raw_spin_unlock(&logbuf_lock); lockdep_on(); + /* + * By default we print message to console asynchronously so that kernel + * doesn't get stalled due to slow serial console. That can lead to + * softlockups, lost interrupts, or userspace timing out under heavy + * printing load. + * + * However we resort to synchronous printing of messages during early + * boot, when synchronous printing was explicitly requested by + * kernel parameter, or when console_verbose() was called to print + * everything during panic / oops. + * + * Deferred messages (in_sched), however, must not be printed + * synchronously. + */ + if (!sync_print) { + if (printk_wq && !in_panic) { + __this_cpu_or(printk_pending, + PRINTK_PENDING_WQ_OUTPUT); + irq_work_queue(this_cpu_ptr(&wake_up_klogd_work)); + } else if (in_sched) { + __this_cpu_or(printk_pending, + PRINTK_PENDING_IRQ_OUTPUT); + irq_work_queue(this_cpu_ptr(&wake_up_klogd_work)); + } else { + sync_print = true; + } + } local_irq_restore(flags); - /* If called from the scheduler, we can not call up(). */ - if (!in_sched) { + if (sync_print) { lockdep_off(); /* * Try to acquire and then immediately release the console @@ -2724,60 +2856,6 @@ late_initcall(printk_late_init); #if defined CONFIG_PRINTK /* - * Delayed printk version, for scheduler-internal messages: - */ -#define PRINTK_PENDING_WAKEUP 0x01 -#define PRINTK_PENDING_OUTPUT 0x02 - -static DEFINE_PER_CPU(int, printk_pending); - -static void wake_up_klogd_work_func(struct irq_work *irq_work) -{ - int pending = __this_cpu_xchg(printk_pending, 0); - - if (pending & PRINTK_PENDING_OUTPUT) { - /* If trylock fails, someone else is doing the printing */ - if (console_trylock()) - console_unlock(); - } - - if (pending & PRINTK_PENDING_WAKEUP) - wake_up_interruptible(&log_wait); -} - -static DEFINE_PER_CPU(struct irq_work, wake_up_klogd_work) = { - .func = wake_up_klogd_work_func, - .flags = IRQ_WORK_LAZY, -}; - -void wake_up_klogd(void) -{ - preempt_disable(); - if (waitqueue_active(&log_wait)) { - this_cpu_or(printk_pending, PRINTK_PENDING_WAKEUP); - irq_work_queue(this_cpu_ptr(&wake_up_klogd_work)); - } - preempt_enable(); -} - -int printk_deferred(const char *fmt, ...) -{ - va_list args; - int r; - - preempt_disable(); - va_start(args, fmt); - r = vprintk_emit(0, LOGLEVEL_SCHED, NULL, 0, fmt, args); - va_end(args); - - __this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT); - irq_work_queue(this_cpu_ptr(&wake_up_klogd_work)); - preempt_enable(); - - return r; -} - -/* * printk rate limiting, lifted from the networking subsystem. * * This enforces a rate limit: not more than 10 kernel messages -- 2.8.0.rc0.1.gd285ab0 ^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async 2016-03-06 6:32 ` Sergey Senozhatsky @ 2016-03-06 7:18 ` Tetsuo Handa 2016-03-06 9:35 ` Sergey Senozhatsky 0 siblings, 1 reply; 28+ messages in thread From: Tetsuo Handa @ 2016-03-06 7:18 UTC (permalink / raw) To: sergey.senozhatsky.work, akpm Cc: jack, pmladek, tj, linux-kernel, sergey.senozhatsky, jack Sergey Senozhatsky wrote: > printk() is expected to work under different conditions and in different > scenarios, including corner cases of OOM when all of the workers are busy > (e.g. allocating memory). Thus by default printk() uses its own dedicated > workqueue with WQ_MEM_RECLAIM bit set. It falls back to system_long_wq > (console_unlock() is time unbound) only if it has failed to allocate > printk_wq. Another thing to mention, is that deferred printk() messages > may appear before printk_wq is allocated, so in the very beginning we > have to printk deferred messages the old way -- in IRQ context. I think we should not depend on system_long_wq which does not have WQ_MEM_RECLAIM bit. If workqueue allocation upon boot fails (due to ENOMEM), such systems won't be able to start userspace programs. Moreover, I don't like use of a workqueue even if printk_wq was allocated with WQ_MEM_RECLAIM bit. As you can see in the discussion of the OOM reaper, the OOM reaper chose a dedicated kernel thread rather than a workqueue ( http://lkml.kernel.org/r/1454505240-23446-2-git-send-email-mhocko@kernel.org ). Blocking actual printing until ongoing workqueue item calls schedule_timeout_*() is not nice (see commit 373ccbe59270 and 564e81a57f97). Use of WQ_MEM_RECLAIM means we add a task_struct for that workqueue. Thus, using a kernel thread does not change total number of task_struct compared to WQ_MEM_RECLAIM approach. I think actual printing should occur as soon as possible rather than randomly deferred until workqueue item sleeps. > +static void printing_work_func(struct work_struct *work) > +{ > + console_lock(); > + console_unlock(); > +} Is this safe? If somebody invokes the OOM killer between console_lock() and console_unlock(), won't this cause OOM killer messages not printed? ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async 2016-03-06 7:18 ` Tetsuo Handa @ 2016-03-06 9:35 ` Sergey Senozhatsky 2016-03-06 11:06 ` Tetsuo Handa 0 siblings, 1 reply; 28+ messages in thread From: Sergey Senozhatsky @ 2016-03-06 9:35 UTC (permalink / raw) To: Tetsuo Handa Cc: sergey.senozhatsky.work, akpm, jack, pmladek, tj, linux-kernel, sergey.senozhatsky, jack On (03/06/16 16:18), Tetsuo Handa wrote: > Sergey Senozhatsky wrote: > > printk() is expected to work under different conditions and in different > > scenarios, including corner cases of OOM when all of the workers are busy > > (e.g. allocating memory). Thus by default printk() uses its own dedicated > > workqueue with WQ_MEM_RECLAIM bit set. It falls back to system_long_wq > > (console_unlock() is time unbound) only if it has failed to allocate > > printk_wq. Another thing to mention, is that deferred printk() messages > > may appear before printk_wq is allocated, so in the very beginning we > > have to printk deferred messages the old way -- in IRQ context. > > I think we should not depend on system_long_wq which does not have > WQ_MEM_RECLAIM bit. If workqueue allocation upon boot fails (due to ENOMEM), > such systems won't be able to start userspace programs. well, yes. system_long_wq is a fallback, but probably just BUG_ON(!printk_wq) would be a better thing to do. > Moreover, I don't like use of a workqueue even if printk_wq was allocated > with WQ_MEM_RECLAIM bit. As you can see in the discussion of the OOM reaper, > the OOM reaper chose a dedicated kernel thread rather than a workqueue > ( http://lkml.kernel.org/r/1454505240-23446-2-git-send-email-mhocko@kernel.org ). > > Blocking actual printing until ongoing workqueue item calls schedule_timeout_*() > is not nice (see commit 373ccbe59270 and 564e81a57f97). printk and console_drivers are expected to be 'callable' from any context, including IRQs, or under spin_locks, etc. so neither part of printk->console_unlock()->console_driver->write() can sleep. console_driver->write() is quite often something like this foo_write() { spin_lock_irqsave(&port->lock); uart_console_write( ... foo_put_char); spin_unlock_iqrestore(&port->lock); } and foo_put_char(...), perhaps, waits for device to become ready transaction while (!(UART_GET_STATUS(port) & TXEMPTY)) cpu_relax(); and then sends out the character to the device UART_SET_DATA(port, (unsigned char)ch) so an already executing printk_wq item shouldn't block because of OOM happening on another CPU. do you mean a new worker allocation delay and a MAYDAY timer delay? > Use of WQ_MEM_RECLAIM means we add a task_struct for that workqueue. Thus, using > a kernel thread does not change total number of task_struct compared to WQ_MEM_RECLAIM > approach. I think actual printing should occur as soon as possible rather than randomly > deferred until workqueue item sleeps. hm, need to take a look. > > +static void printing_work_func(struct work_struct *work) > > +{ > > + console_lock(); > > + console_unlock(); > > +} > > Is this safe? If somebody invokes the OOM killer between console_lock() > and console_unlock(), won't this cause OOM killer messages not printed? between console_lock() and console_unlock() nothing can steal that lock from worker; so once a worker grabbed the console_lock() it will print (and console_lock() does not mean "lock logbuf", printk() from other CPUs can add messages to the logbuf while worker loops in console_unlock()). if you mean something like console_lock(); for (...) { do_foo() { ... pr_err(" ... foo message ...\n"); ... } } console_unlock(); then yes, nothing will be printed until that process executes console_unlock(), because it's console_unlock() that pushes the messages to console drivers. (console_lock()/console_unlock() have other problems and issues, which are not addressed in this patch set). there is only one function that ignores the state of console semaphore: panic()->console_flush_on_panic() but that's a separate thing, not related to this patch set. if you mean something else here, then please more details. -ss ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async 2016-03-06 9:35 ` Sergey Senozhatsky @ 2016-03-06 11:06 ` Tetsuo Handa 2016-03-06 13:27 ` Sergey Senozhatsky 0 siblings, 1 reply; 28+ messages in thread From: Tetsuo Handa @ 2016-03-06 11:06 UTC (permalink / raw) To: sergey.senozhatsky.work Cc: akpm, jack, pmladek, tj, linux-kernel, sergey.senozhatsky, jack Sergey Senozhatsky wrote: > On (03/06/16 16:18), Tetsuo Handa wrote: > > Moreover, I don't like use of a workqueue even if printk_wq was allocated > > with WQ_MEM_RECLAIM bit. As you can see in the discussion of the OOM reaper, > > the OOM reaper chose a dedicated kernel thread rather than a workqueue > > ( http://lkml.kernel.org/r/1454505240-23446-2-git-send-email-mhocko@kernel.org ). > > > > Blocking actual printing until ongoing workqueue item calls schedule_timeout_*() > > is not nice (see commit 373ccbe59270 and 564e81a57f97). > > do you mean a new worker allocation delay and a MAYDAY timer delay? > I don't know what MAYDAY is. I'm talking about a situation where printing_work work item is not processed (i.e. printing_work_func() is not called) until current work item calls schedule_timeout_*(). We had a problem that since vmstat_work work item was using system_wq, vmstat_work work item was not processed (i.e. vmstat_update() was not called) if kworker was looping inside memory allocator without calling schedule_timeout_*() due to disk_events_workfn() doing GFP_NOIO allocation). ---------- [ 271.579276] MemAlloc: kworker/0:56(7399) gfp=0x2400000 order=0 delay=129294 [ 271.581237] ffff88007c78fa08 ffff8800778f8c80 ffff88007c790000 ffff8800778f8c80 [ 271.583329] 0000000002400000 0000000000000000 ffff8800778f8c80 ffff88007c78fa20 [ 271.585391] ffffffff8162aa9d 0000000000000001 ffff88007c78fa30 ffffffff8162aac7 [ 271.587463] Call Trace: [ 271.588512] [<ffffffff8162aa9d>] preempt_schedule_common+0x18/0x2b [ 271.590243] [<ffffffff8162aac7>] _cond_resched+0x17/0x20 [ 271.591830] [<ffffffff8111fafe>] __alloc_pages_nodemask+0x64e/0xcc0 [ 271.593561] [<ffffffff8116a3b2>] ? __kmalloc+0x22/0x190 [ 271.595119] [<ffffffff81160ce7>] alloc_pages_current+0x87/0x110 [ 271.596778] [<ffffffff812e95f4>] bio_copy_kern+0xc4/0x180 [ 271.598342] [<ffffffff810a6a00>] ? wait_woken+0x80/0x80 [ 271.599878] [<ffffffff812f25f0>] blk_rq_map_kern+0x70/0x130 [ 271.601481] [<ffffffff812ece35>] ? blk_get_request+0x75/0xe0 [ 271.603100] [<ffffffff814433fd>] scsi_execute+0x12d/0x160 [ 271.604657] [<ffffffff81443524>] scsi_execute_req_flags+0x84/0xf0 [ 271.606339] [<ffffffffa01db742>] sr_check_events+0xb2/0x2a0 [sr_mod] [ 271.608141] [<ffffffff8109cbfc>] ? set_next_entity+0x6c/0x6a0 [ 271.609830] [<ffffffffa01cf163>] cdrom_check_events+0x13/0x30 [cdrom] [ 271.611610] [<ffffffffa01dbb85>] sr_block_check_events+0x25/0x30 [sr_mod] [ 271.613429] [<ffffffff812fc7eb>] disk_check_events+0x5b/0x150 [ 271.615065] [<ffffffff812fc8f1>] disk_events_workfn+0x11/0x20 [ 271.616699] [<ffffffff810827c5>] process_one_work+0x135/0x310 [ 271.618321] [<ffffffff81082abb>] worker_thread+0x11b/0x4a0 [ 271.620018] [<ffffffff810829a0>] ? process_one_work+0x310/0x310 [ 271.622022] [<ffffffff81087e53>] kthread+0xd3/0xf0 [ 271.623533] [<ffffffff81087d80>] ? kthread_create_on_node+0x1a0/0x1a0 [ 271.625487] [<ffffffff8162f09f>] ret_from_fork+0x3f/0x70 [ 271.627175] [<ffffffff81087d80>] ? kthread_create_on_node+0x1a0/0x1a0 ---------- Even if you use printk_wq with WQ_MEM_RECLAIM for printing_work work item, printing_work_func() will not be called until current work item calls schedule_timeout_*(). That will be an undesirable random delay. If you use a dedicated kernel thread rather than a dedicated workqueue with WQ_MEM_RECLAIM, we can avoid this random delay. > > > +static void printing_work_func(struct work_struct *work) > > > +{ > > > + console_lock(); > > > + console_unlock(); > > > +} > > > > Is this safe? If somebody invokes the OOM killer between console_lock() > > and console_unlock(), won't this cause OOM killer messages not printed? > > if you mean something like > > console_lock(); > for (...) { > do_foo() { > ... > pr_err(" ... foo message ...\n"); > ... > } > } > console_unlock(); > > then yes, nothing will be printed until that process executes console_unlock(), > because it's console_unlock() that pushes the messages to console drivers. Yes, I meant a sequence like console_lock(); ptr = kmalloc(GFP_KERNEL); kfree(ptr); console_unlock(); and kmalloc() prints OOM killer messages rather than failing that allocation. Are we sure that there is no such usage? ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async 2016-03-06 11:06 ` Tetsuo Handa @ 2016-03-06 13:27 ` Sergey Senozhatsky 2016-03-06 14:54 ` Tetsuo Handa 2016-03-07 8:22 ` Jan Kara 0 siblings, 2 replies; 28+ messages in thread From: Sergey Senozhatsky @ 2016-03-06 13:27 UTC (permalink / raw) To: Tetsuo Handa Cc: sergey.senozhatsky.work, akpm, jack, pmladek, tj, linux-kernel, sergey.senozhatsky, jack On (03/06/16 20:06), Tetsuo Handa wrote: [..] > > do you mean a new worker allocation delay and a MAYDAY timer delay? > > > > I don't know what MAYDAY is. I'm talking about a situation where printing_work > work item is not processed (i.e. printing_work_func() is not called) until > current work item calls schedule_timeout_*(). > > We had a problem that since vmstat_work work item was using system_wq, > vmstat_work work item was not processed (i.e. vmstat_update() was not called) if > kworker was looping inside memory allocator without calling schedule_timeout_*() > due to disk_events_workfn() doing GFP_NOIO allocation). hm, just for note, none of system-wide wqs seem to have a ->rescuer thread (WQ_MEM_RECLAIM). [..] > Even if you use printk_wq with WQ_MEM_RECLAIM for printing_work work item, > printing_work_func() will not be called until current work item calls > schedule_timeout_*(). That will be an undesirable random delay. If you use > a dedicated kernel thread rather than a dedicated workqueue with WQ_MEM_RECLAIM, > we can avoid this random delay. hm. yes, seems that it may take some time until workqueue wakeup() a ->rescuer thread. need to look more. [..] > > console_lock(); > > for (...) { > > do_foo() { > > ... > > pr_err(" ... foo message ...\n"); > > ... > > } > > } > > console_unlock(); > > > > then yes, nothing will be printed until that process executes console_unlock(), > > because it's console_unlock() that pushes the messages to console drivers. > > Yes, I meant a sequence like > > console_lock(); > ptr = kmalloc(GFP_KERNEL); > kfree(ptr); > console_unlock(); > > and kmalloc() prints OOM killer messages rather than failing that allocation. > Are we sure that there is no such usage? such usage is quite possible. problems that I have with console_lock()/console_unlock() is that these functions serve a double purpose: exclusive printk() lock and a console_drivers list lock. **** I haven't really thought about it yet, but I want to split it. **** console_lock()/console_unlock() can be executed by user space processes (inside system calls). for example: SyS_open() ... chrdev_open() tty_open() console_device() console_lock() console_unlock() for (;;) { call_console_drivers() } or doing `cat /proc/consoles` SyS_read() vfs_read() proc_reg_read() seq_read() c_stop() console_unlock() for (;;) { call_console_drivers() } which can introduce two nasty problems: 1) console_lock() may put user space process in TASK_UNINTERRUPTIBLE for unknown period of time -- until current console_sem owner will not finish print loop in console_unlock(). no signals, heartbeats, etc. will be processed by this user space process. 2) user space process may have to spend an unknown period of time in console_unlock() later, pushing "3rd party" kernel messages to console drivers. again, not really good. (kthreads can suffer here too, sure). in the examples above, a process just wanted to iterate the console_drivers list in read access mode. so, for instance, in struct tty_driver *console_device(int *index) { struct console *c; struct tty_driver *driver = NULL; console_lock(); for_each_console(c) { if (!c->device) continue; driver = c->device(c, index); if (driver) break; } console_unlock(); return driver; } instead of console_lock()/console_unlock()->call_console_drivers() it could do (very schematically) read_lock_console(); for_each_console(c) { if (!c->device) continue; driver = c->device(c, index); if (driver) break; } read_unlock_console(); and in functions that modify the list, the lock can be acquired in write mode. example, int unregister_console(struct console *console) { write_lock_console(); ... for (a=console_drivers->next, b=console_drivers ; a; b=a, a=b->next) { if (a == console) { b->next = a->next; res = 0; break; } } } ... if (console_drivers != NULL && console->flags & CON_CONSDEV) console_drivers->flags |= CON_CONSDEV; console->flags &= ~CON_ENABLED; write_unlock_unlock(); printk(), thus, will take its own "another" exclusive lock, to guarantee that only one process can call_console_drivers(), and it will take the console_drivers list lock in read mode. so other process(-es) that also want to access console_drivers list in read mode will not wait in TASK_UNINTERRUPTIBLE. of course, this means that console_unlock() now does not print anything to console drivers. it's printk() duty to do this. and the last statement can be very hard to sell; because I don't know for sure if there is a code in the kernel that depends on the fact that console_lock() stops printk() and that console_unlock() prints all of the printk messages. that "lock split" also can fix another *theoretical* scenario: suppose, that for some reason someone's setup has both a) huge number of printk() calls and b) a relatively huge number of console_lock()/console_unlock() calls, all happening simultaneously. while async printk helps in (a), case (b) "detours" the printk async code; and if cpus at lest some of the cpus that do console_lock()/console_unlock() also disable IRQs or preemption, then lockups and all the bad things are very much likely. just an idea. -ss ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async 2016-03-06 13:27 ` Sergey Senozhatsky @ 2016-03-06 14:54 ` Tetsuo Handa 2016-03-07 8:22 ` Jan Kara 1 sibling, 0 replies; 28+ messages in thread From: Tetsuo Handa @ 2016-03-06 14:54 UTC (permalink / raw) To: sergey.senozhatsky.work Cc: akpm, jack, pmladek, tj, linux-kernel, sergey.senozhatsky, jack Sergey Senozhatsky wrote: > such usage is quite possible. > > problems that I have with console_lock()/console_unlock() is that > these functions serve a double purpose: exclusive printk() lock and a > console_drivers list lock. Yes, I don't like it too. > > **** I haven't really thought about it yet, but I want to split it. **** > Since writing to console does not call schedule(), I think rcu_read_lock()/rcu_read_unlock()/synchronize_rcu() (or synchronize_rcu_*() ?) can manage it without using read_lock_console()/read_unlock_console()/write_lock_console()/write_unlock_console(). Replacing console_lock()/console_unlock() for protecting console_drivers list with RCU might be helpful. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async 2016-03-06 13:27 ` Sergey Senozhatsky 2016-03-06 14:54 ` Tetsuo Handa @ 2016-03-07 8:22 ` Jan Kara 2016-03-07 10:12 ` Sergey Senozhatsky 2016-03-07 15:42 ` Tejun Heo 1 sibling, 2 replies; 28+ messages in thread From: Jan Kara @ 2016-03-07 8:22 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Tetsuo Handa, akpm, jack, pmladek, tj, linux-kernel, sergey.senozhatsky, jack On Sun 06-03-16 22:27:03, Sergey Senozhatsky wrote: > On (03/06/16 20:06), Tetsuo Handa wrote: > [..] > > > do you mean a new worker allocation delay and a MAYDAY timer delay? > > > > > > > I don't know what MAYDAY is. I'm talking about a situation where printing_work > > work item is not processed (i.e. printing_work_func() is not called) until > > current work item calls schedule_timeout_*(). > > > > We had a problem that since vmstat_work work item was using system_wq, > > vmstat_work work item was not processed (i.e. vmstat_update() was not called) if > > kworker was looping inside memory allocator without calling schedule_timeout_*() > > due to disk_events_workfn() doing GFP_NOIO allocation). > > hm, just for note, none of system-wide wqs seem to have a ->rescuer thread > (WQ_MEM_RECLAIM). > > [..] > > Even if you use printk_wq with WQ_MEM_RECLAIM for printing_work work item, > > printing_work_func() will not be called until current work item calls > > schedule_timeout_*(). That will be an undesirable random delay. If you use > > a dedicated kernel thread rather than a dedicated workqueue with WQ_MEM_RECLAIM, > > we can avoid this random delay. > > hm. yes, seems that it may take some time until workqueue wakeup() a ->rescuer thread. > need to look more. Yes, it takes some time (0.1s or 2 jiffies) before workqueue code gives up creating a worker process and wakes up rescuer thread. However I don't see that as a problem... > > > console_lock(); > > > for (...) { > > > do_foo() { > > > ... > > > pr_err(" ... foo message ...\n"); > > > ... > > > } > > > } > > > console_unlock(); > > > > > > then yes, nothing will be printed until that process executes console_unlock(), > > > because it's console_unlock() that pushes the messages to console drivers. > > > > Yes, I meant a sequence like > > > > console_lock(); > > ptr = kmalloc(GFP_KERNEL); > > kfree(ptr); > > console_unlock(); > > > > and kmalloc() prints OOM killer messages rather than failing that allocation. > > Are we sure that there is no such usage? > > such usage is quite possible. > > problems that I have with console_lock()/console_unlock() is that > these functions serve a double purpose: exclusive printk() lock and a > console_drivers list lock. Well, but changing how console locking works is a separate issue, isn't it? So please as a separate patch set if you want to try it. Actually I don't think changing the locking will be so easy. console_lock/unlock is used e.g. for console blanking where you need to block any printing while you call ->unblank() for each console. That being said I don't think improvement is impossible, just given my experiences with console / printk code there will be surprises waiting for you :). Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async 2016-03-07 8:22 ` Jan Kara @ 2016-03-07 10:12 ` Sergey Senozhatsky 2016-03-07 10:52 ` Jan Kara 2016-03-07 11:10 ` Tetsuo Handa 2016-03-07 15:42 ` Tejun Heo 1 sibling, 2 replies; 28+ messages in thread From: Sergey Senozhatsky @ 2016-03-07 10:12 UTC (permalink / raw) To: Jan Kara Cc: Sergey Senozhatsky, Tetsuo Handa, akpm, jack, pmladek, tj, linux-kernel, sergey.senozhatsky Hello, On (03/07/16 09:22), Jan Kara wrote: [..] > > hm, just for note, none of system-wide wqs seem to have a ->rescuer thread > > (WQ_MEM_RECLAIM). > > > > [..] > > > Even if you use printk_wq with WQ_MEM_RECLAIM for printing_work work item, > > > printing_work_func() will not be called until current work item calls > > > schedule_timeout_*(). That will be an undesirable random delay. If you use > > > a dedicated kernel thread rather than a dedicated workqueue with WQ_MEM_RECLAIM, > > > we can avoid this random delay. > > > > hm. yes, seems that it may take some time until workqueue wakeup() a ->rescuer thread. > > need to look more. > > Yes, it takes some time (0.1s or 2 jiffies) before workqueue code gives up > creating a worker process and wakes up rescuer thread. However I don't see > that as a problem... yes, that's why I asked Tetsuo whether his concern was a wq's MAYDAY timer delay. the two commits that Tetsuo pointed at earlier in he loop (373ccbe59270 and 564e81a57f97) solved the problem by switching to WQ_MEM_RECLAIM wq. I've slightly tested OOM-kill on my desktop system and haven't spotted any printk delays (well, a test on desktop is not really representative, of course). the only thing that so far grabbed my attention - is __this_cpu_or(printk_pending) irq_work_queue(this_cpu_ptr(&wake_up_klogd_work)); a _theoretical_ corner case here is when we have only one CPU doing a bunch of printk()s and this CPUs disables irqs in advance local_irq_save for (...) printk() local_irq_restore() if no other CPUs see `printk_pending' then nothing will be printed up until local_irq_restore() (assuming that IRQ disable time is withing the hardlockup detection threshold). if any other CPUs concurrently execute printk then we are fine, but a) if none -- then we probably have a small change in behaviour and b) UP systems [..] > > such usage is quite possible. > > > > problems that I have with console_lock()/console_unlock() is that > > these functions serve a double purpose: exclusive printk() lock and a > > console_drivers list lock. > > Well, but changing how console locking works is a separate issue, isn't it? > So please as a separate patch set if you want to try it. absolutely agree, this is a separate thing. > Actually I don't think changing the locking will be so easy. again, agree. splitting any lock is always tricky and risky. especially if we talk about console_sem. it can easily add up new deadlocks, make some fbcon unhappy, etc. etc. register_console() write_lock_console_lock() if (error) printk() printk_lock() read_lock_console_lock() <- eadlock and so on and so forth; I'm not very enthusiastic to change this at the moment. -ss > console_lock/unlock is used e.g. for console blanking where you need to > block any printing while you call ->unblank() for each console. That being > said I don't think improvement is impossible, just given my experiences > with console / printk code there will be surprises waiting for you :). > > Honza > -- > Jan Kara <jack@suse.com> > SUSE Labs, CR > ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async 2016-03-07 10:12 ` Sergey Senozhatsky @ 2016-03-07 10:52 ` Jan Kara 2016-03-07 12:16 ` Jan Kara 2016-03-07 14:40 ` Sergey Senozhatsky 2016-03-07 11:10 ` Tetsuo Handa 1 sibling, 2 replies; 28+ messages in thread From: Jan Kara @ 2016-03-07 10:52 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Jan Kara, Tetsuo Handa, akpm, jack, pmladek, tj, linux-kernel, sergey.senozhatsky On Mon 07-03-16 19:12:33, Sergey Senozhatsky wrote: > Hello, > > On (03/07/16 09:22), Jan Kara wrote: > [..] > > > hm, just for note, none of system-wide wqs seem to have a ->rescuer thread > > > (WQ_MEM_RECLAIM). > > > > > > [..] > > > > Even if you use printk_wq with WQ_MEM_RECLAIM for printing_work work item, > > > > printing_work_func() will not be called until current work item calls > > > > schedule_timeout_*(). That will be an undesirable random delay. If you use > > > > a dedicated kernel thread rather than a dedicated workqueue with WQ_MEM_RECLAIM, > > > > we can avoid this random delay. > > > > > > hm. yes, seems that it may take some time until workqueue wakeup() a ->rescuer thread. > > > need to look more. > > > > Yes, it takes some time (0.1s or 2 jiffies) before workqueue code gives up > > creating a worker process and wakes up rescuer thread. However I don't see > > that as a problem... > > yes, that's why I asked Tetsuo whether his concern was a wq's MAYDAY timer > delay. the two commits that Tetsuo pointed at earlier in he loop (373ccbe59270 > and 564e81a57f97) solved the problem by switching to WQ_MEM_RECLAIM wq. > I've slightly tested OOM-kill on my desktop system and haven't spotted any > printk delays (well, a test on desktop is not really representative, of > course). > > > the only thing that so far grabbed my attention - is > > __this_cpu_or(printk_pending) > irq_work_queue(this_cpu_ptr(&wake_up_klogd_work)); > > a _theoretical_ corner case here is when we have only one CPU doing a bunch > of printk()s and this CPUs disables irqs in advance > local_irq_save > for (...) > printk() > local_irq_restore() > > if no other CPUs see `printk_pending' then nothing will be printed up > until local_irq_restore() (assuming that IRQ disable time is withing > the hardlockup detection threshold). if any other CPUs concurrently > execute printk then we are fine, but > a) if none -- then we probably have a small change in behaviour > and > b) UP systems So for UP systems, we should by default disable async printing anyway I suppose. It is just a pointless overhead. So please just make printk_sync default to true if !CONFIG_SMP. When IRQs are disabled, you're right we will have a change in behavior. I don't see an easy way of avoiding delaying of printk until IRQs get enabled. I don't want to queue work directly because that creates possibility for lock recursion in queue_work(). And playing some tricks with irq_works isn't easy either - you cannot actually rely on any other CPU doing anything (even a timer tick) because of NOHZ. So if this will be a problem in practice, using a kthread will probably be the easiest solution. Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async 2016-03-07 10:52 ` Jan Kara @ 2016-03-07 12:16 ` Jan Kara 2016-03-07 12:37 ` Tetsuo Handa ` (2 more replies) 2016-03-07 14:40 ` Sergey Senozhatsky 1 sibling, 3 replies; 28+ messages in thread From: Jan Kara @ 2016-03-07 12:16 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Jan Kara, Tetsuo Handa, akpm, jack, pmladek, tj, linux-kernel, sergey.senozhatsky On Mon 07-03-16 11:52:48, Jan Kara wrote: > On Mon 07-03-16 19:12:33, Sergey Senozhatsky wrote: > > Hello, > > > > On (03/07/16 09:22), Jan Kara wrote: > > [..] > > > > hm, just for note, none of system-wide wqs seem to have a ->rescuer thread > > > > (WQ_MEM_RECLAIM). > > > > > > > > [..] > > > > > Even if you use printk_wq with WQ_MEM_RECLAIM for printing_work work item, > > > > > printing_work_func() will not be called until current work item calls > > > > > schedule_timeout_*(). That will be an undesirable random delay. If you use > > > > > a dedicated kernel thread rather than a dedicated workqueue with WQ_MEM_RECLAIM, > > > > > we can avoid this random delay. > > > > > > > > hm. yes, seems that it may take some time until workqueue wakeup() a ->rescuer thread. > > > > need to look more. > > > > > > Yes, it takes some time (0.1s or 2 jiffies) before workqueue code gives up > > > creating a worker process and wakes up rescuer thread. However I don't see > > > that as a problem... > > > > yes, that's why I asked Tetsuo whether his concern was a wq's MAYDAY timer > > delay. the two commits that Tetsuo pointed at earlier in he loop (373ccbe59270 > > and 564e81a57f97) solved the problem by switching to WQ_MEM_RECLAIM wq. > > I've slightly tested OOM-kill on my desktop system and haven't spotted any > > printk delays (well, a test on desktop is not really representative, of > > course). > > > > > > the only thing that so far grabbed my attention - is > > > > __this_cpu_or(printk_pending) > > irq_work_queue(this_cpu_ptr(&wake_up_klogd_work)); > > > > a _theoretical_ corner case here is when we have only one CPU doing a bunch > > of printk()s and this CPUs disables irqs in advance > > local_irq_save > > for (...) > > printk() > > local_irq_restore() > > > > if no other CPUs see `printk_pending' then nothing will be printed up > > until local_irq_restore() (assuming that IRQ disable time is withing > > the hardlockup detection threshold). if any other CPUs concurrently > > execute printk then we are fine, but > > a) if none -- then we probably have a small change in behaviour > > and > > b) UP systems > > So for UP systems, we should by default disable async printing anyway I > suppose. It is just a pointless overhead. So please just make printk_sync > default to true if !CONFIG_SMP. > > When IRQs are disabled, you're right we will have a change in behavior. I > don't see an easy way of avoiding delaying of printk until IRQs get > enabled. I don't want to queue work directly because that creates > possibility for lock recursion in queue_work(). And playing some tricks > with irq_works isn't easy either - you cannot actually rely on any other > CPU doing anything (even a timer tick) because of NOHZ. > > So if this will be a problem in practice, using a kthread will probably be > the easiest solution. Hum, and thinking more about it: Considering that WQ_MEM_RECLAIM workqueues create kthread anyway as a rescuer thread, it may be the simplest to just go back to using a single kthread for printing. What do you think? Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async 2016-03-07 12:16 ` Jan Kara @ 2016-03-07 12:37 ` Tetsuo Handa 2016-03-07 15:10 ` Sergey Senozhatsky 2016-03-09 6:09 ` Sergey Senozhatsky 2 siblings, 0 replies; 28+ messages in thread From: Tetsuo Handa @ 2016-03-07 12:37 UTC (permalink / raw) To: jack, sergey.senozhatsky.work Cc: akpm, jack, pmladek, tj, linux-kernel, sergey.senozhatsky Jan Kara wrote: > > So for UP systems, we should by default disable async printing anyway I > > suppose. It is just a pointless overhead. So please just make printk_sync > > default to true if !CONFIG_SMP. > > > > When IRQs are disabled, you're right we will have a change in behavior. I > > don't see an easy way of avoiding delaying of printk until IRQs get > > enabled. I don't want to queue work directly because that creates > > possibility for lock recursion in queue_work(). And playing some tricks > > with irq_works isn't easy either - you cannot actually rely on any other > > CPU doing anything (even a timer tick) because of NOHZ. > > > > So if this will be a problem in practice, using a kthread will probably be > > the easiest solution. > > Hum, and thinking more about it: Considering that WQ_MEM_RECLAIM workqueues > create kthread anyway as a rescuer thread, it may be the simplest to just > go back to using a single kthread for printing. What do you think? > I think a plain kthread is better. printing_work_func() can become a buggy function. console_unlock() from printing_work_func() can occupy kworker for many seconds due to looping with cond_resched() (i.e. without calling schedule_timeout_*()) from schedulable context which can in turn defer other work items. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async 2016-03-07 12:16 ` Jan Kara 2016-03-07 12:37 ` Tetsuo Handa @ 2016-03-07 15:10 ` Sergey Senozhatsky 2016-03-07 15:49 ` Tejun Heo 2016-03-09 6:09 ` Sergey Senozhatsky 2 siblings, 1 reply; 28+ messages in thread From: Sergey Senozhatsky @ 2016-03-07 15:10 UTC (permalink / raw) To: Jan Kara Cc: Sergey Senozhatsky, Tetsuo Handa, akpm, jack, pmladek, tj, linux-kernel, sergey.senozhatsky On (03/07/16 13:16), Jan Kara wrote: [..] > > So for UP systems, we should by default disable async printing anyway I > > suppose. It is just a pointless overhead. So please just make printk_sync > > default to true if !CONFIG_SMP. > > > > When IRQs are disabled, you're right we will have a change in behavior. I > > don't see an easy way of avoiding delaying of printk until IRQs get > > enabled. I don't want to queue work directly because that creates > > possibility for lock recursion in queue_work(). And playing some tricks > > with irq_works isn't easy either - you cannot actually rely on any other > > CPU doing anything (even a timer tick) because of NOHZ. > > > > So if this will be a problem in practice, using a kthread will probably be > > the easiest solution. > > Hum, and thinking more about it: Considering that WQ_MEM_RECLAIM workqueues > create kthread anyway as a rescuer thread, it may be the simplest to just > go back to using a single kthread for printing. What do you think? A new version. Switched to [printk] kthread. Minimally tested only. ===8<===8<=== >From 6f697488b20b356e4103ddcc4f8b0ec1fa6a9531 Mon Sep 17 00:00:00 2001 From: Jan Kara <jack@suse.cz> Date: Mon, 7 Mar 2016 23:53:46 +0900 Subject: [PATCH 1/2] printk: Make printk() completely async Currently, printk() sometimes waits for message to be printed to console and sometimes it does not (when console_sem is held by some other process). In case printk() grabs console_sem and starts printing to console, it prints messages from kernel printk buffer until the buffer is empty. When serial console is attached, printing is slow and thus other CPUs in the system have plenty of time to append new messages to the buffer while one CPU is printing. Thus the CPU can spend unbounded amount of time doing printing in console_unlock(). This is especially serious problem if the printk() calling console_unlock() was called with interrupts disabled. In practice users have observed a CPU can spend tens of seconds printing in console_unlock() (usually during boot when hundreds of SCSI devices are discovered) resulting in RCU stalls (CPU doing printing doesn't reach quiescent state for a long time), softlockup reports (IPIs for the printing CPU don't get served and thus other CPUs are spinning waiting for the printing CPU to process IPIs), and eventually a machine death (as messages from stalls and lockups append to printk buffer faster than we are able to print). So these machines are unable to boot with serial console attached. Another observed issue is that due to slow printk, hardware discovery is slow and udev times out before kernel manages to discover all the attached HW. Also during artificial stress testing SATA disk disappears from the system because its interrupts aren't served for too long. This patch makes printk() completely asynchronous (similar to what printk_deferred() did until now). It appends message to the kernel printk buffer and queues work to do the printing to console. This has the advantage that printing always happens from a schedulable contex and thus we don't lockup any particular CPU or even interrupts. Also it has the advantage that printk() is fast and thus kernel booting is not slowed down by slow serial console. Disadvantage of this method is that in case of crash there is higher chance that important messages won't appear in console output (we may need working scheduling to print message to console). We somewhat mitigate this risk by switching printk to the original method of immediate printing to console if oops is in progress. Also for debugging purposes we provide printk.synchronous kernel parameter which resorts to the original printk behavior. printk() is expected to work under different conditions and in different scenarios, including corner cases of OOM when all of the workers are busy (e.g. allocating memory), thus printk() uses its own dedicated printing kthread, rather than relying on workqueue (even with WQ_MEM_RECLAIM bit set we potentially can receive delays in printing until workqueue declares a ->mayday, as noted by Tetsuo Handa). Another thing to mention, is that deferred printk() messages may appear before @printk_thread is allocated, so in the very beginning we have to print deferred messages the old way -- via IRQs. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> --- Documentation/kernel-parameters.txt | 10 ++ kernel/printk/printk.c | 193 +++++++++++++++++++++++++----------- 2 files changed, 145 insertions(+), 58 deletions(-) diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index e0a21e4..5b01941 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -3108,6 +3108,16 @@ bytes respectively. Such letter suffixes can also be entirely omitted. printk.time= Show timing data prefixed to each printk message line Format: <bool> (1/Y/y=enable, 0/N/n=disable) + printk.synchronous= + By default kernel messages are printed to console + asynchronously (except during early boot or when oops + is happening). That avoids kernel stalling behind slow + serial console and thus avoids softlockups, interrupt + timeouts, or userspace timing out during heavy printing. + However for debugging problems, printing messages to + console immediately may be desirable. This option + enables such behavior. + processor.max_cstate= [HW,ACPI] Limit processor to maximum C-state max_cstate=9 overrides any DMI blacklist limit. diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index d5fd844..323566b 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -46,6 +46,7 @@ #include <linux/utsname.h> #include <linux/ctype.h> #include <linux/uio.h> +#include <linux/kthread.h> #include <asm/uaccess.h> #include <asm-generic/sections.h> @@ -284,6 +285,98 @@ static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN); static char *log_buf = __log_buf; static u32 log_buf_len = __LOG_BUF_LEN; +/* + * When true, printing to console will happen synchronously unless someone else + * is already printing messages. + * + * The default value on UP systems is 'true'. + */ +static bool __read_mostly printk_sync = !IS_ENABLED(CONFIG_SMP); +module_param_named(synchronous, printk_sync, bool, S_IRUGO | S_IWUSR); +MODULE_PARM_DESC(synchronous, "make printing to console synchronous"); + +/* Printing kthread for async vprintk_emit() */ +static struct task_struct *printk_thread; +/* Wait for printing wakeups from async vprintk_emit() */ +static DECLARE_WAIT_QUEUE_HEAD(printing_wait); + +static int printing_func(void *data) +{ + while (1) { + DECLARE_WAITQUEUE(wait, current); + set_current_state(TASK_INTERRUPTIBLE); + add_wait_queue(&printing_wait, &wait); + + schedule(); + remove_wait_queue(&printing_wait, &wait); + + console_lock(); + console_unlock(); + } + + return 0; +} + +static int __init init_printk_thread(void) +{ + if (printk_sync) + return 0; + + printk_thread = kthread_run(printing_func, NULL, "printk"); + BUG_ON(IS_ERR(printk_thread)); + return 0; +} +late_initcall(init_printk_thread); + +/* + * Delayed printk version, for scheduler-internal messages: + */ +#define PRINTK_PENDING_WAKEUP (1<<0) +#define PRINTK_PENDING_OUTPUT (1<<1) + +static DEFINE_PER_CPU(int, printk_pending); + +static void wake_up_klogd_work_func(struct irq_work *irq_work) +{ + int pending = __this_cpu_xchg(printk_pending, 0); + + if (pending & PRINTK_PENDING_OUTPUT) { + /* If trylock fails, someone else is doing the printing */ + if (console_trylock()) + console_unlock(); + } + + if (pending & PRINTK_PENDING_WAKEUP) + wake_up_interruptible(&log_wait); +} + +static DEFINE_PER_CPU(struct irq_work, wake_up_klogd_work) = { + .func = wake_up_klogd_work_func, + .flags = IRQ_WORK_LAZY, +}; + +void wake_up_klogd(void) +{ + preempt_disable(); + if (waitqueue_active(&log_wait)) { + this_cpu_or(printk_pending, PRINTK_PENDING_WAKEUP); + irq_work_queue(this_cpu_ptr(&wake_up_klogd_work)); + } + preempt_enable(); +} + +int printk_deferred(const char *fmt, ...) +{ + va_list args; + int r; + + va_start(args, fmt); + r = vprintk_emit(0, LOGLEVEL_SCHED, NULL, 0, fmt, args); + va_end(args); + + return r; +} + /* Return log buffer address */ char *log_buf_addr_get(void) { @@ -1609,6 +1702,8 @@ asmlinkage int vprintk_emit(int facility, int level, const char *dict, size_t dictlen, const char *fmt, va_list args) { + /* cpu currently holding logbuf_lock in this function */ + static unsigned int logbuf_cpu = UINT_MAX; static bool recursion_bug; static char textbuf[LOG_LINE_MAX]; char *text = textbuf; @@ -1618,12 +1713,17 @@ asmlinkage int vprintk_emit(int facility, int level, int this_cpu; int printed_len = 0; bool in_sched = false; - /* cpu currently holding logbuf_lock in this function */ - static unsigned int logbuf_cpu = UINT_MAX; + bool in_panic = console_loglevel == CONSOLE_LOGLEVEL_MOTORMOUTH; + bool sync_print = printk_sync; if (level == LOGLEVEL_SCHED) { level = LOGLEVEL_DEFAULT; + /* + * Deferred sched messages must not be printed + * synchronously regardless the @printk_sync or @in_panic. + */ in_sched = true; + sync_print = false; } boot_delay_msec(level); @@ -1752,10 +1852,41 @@ asmlinkage int vprintk_emit(int facility, int level, logbuf_cpu = UINT_MAX; raw_spin_unlock(&logbuf_lock); lockdep_on(); + /* + * By default we print message to console asynchronously so that kernel + * doesn't get stalled due to slow serial console. That can lead to + * softlockups, lost interrupts, or userspace timing out under heavy + * printing load. + * + * However we resort to synchronous printing of messages during early + * boot, when synchronous printing was explicitly requested by + * kernel parameter, or when console_verbose() was called to print + * everything during panic / oops. + */ + if (!sync_print) { + if (printk_thread && !in_panic) { + /* + * Wakeup the printing kthread and offload printing + * to a schedulable context. + */ + wake_up(&printing_wait); + } else if (in_sched) { + /* + * @in_sched messages may come too early, when we don't + * yet have @printk_thread. We can't print deferred + * messages directly, because this may deadlock, route + * them via IRQ context. + */ + __this_cpu_or(printk_pending, + PRINTK_PENDING_OUTPUT); + irq_work_queue(this_cpu_ptr(&wake_up_klogd_work)); + } else { + sync_print = true; + } + } local_irq_restore(flags); - /* If called from the scheduler, we can not call up(). */ - if (!in_sched) { + if (sync_print) { lockdep_off(); /* * Try to acquire and then immediately release the console @@ -2724,60 +2855,6 @@ late_initcall(printk_late_init); #if defined CONFIG_PRINTK /* - * Delayed printk version, for scheduler-internal messages: - */ -#define PRINTK_PENDING_WAKEUP 0x01 -#define PRINTK_PENDING_OUTPUT 0x02 - -static DEFINE_PER_CPU(int, printk_pending); - -static void wake_up_klogd_work_func(struct irq_work *irq_work) -{ - int pending = __this_cpu_xchg(printk_pending, 0); - - if (pending & PRINTK_PENDING_OUTPUT) { - /* If trylock fails, someone else is doing the printing */ - if (console_trylock()) - console_unlock(); - } - - if (pending & PRINTK_PENDING_WAKEUP) - wake_up_interruptible(&log_wait); -} - -static DEFINE_PER_CPU(struct irq_work, wake_up_klogd_work) = { - .func = wake_up_klogd_work_func, - .flags = IRQ_WORK_LAZY, -}; - -void wake_up_klogd(void) -{ - preempt_disable(); - if (waitqueue_active(&log_wait)) { - this_cpu_or(printk_pending, PRINTK_PENDING_WAKEUP); - irq_work_queue(this_cpu_ptr(&wake_up_klogd_work)); - } - preempt_enable(); -} - -int printk_deferred(const char *fmt, ...) -{ - va_list args; - int r; - - preempt_disable(); - va_start(args, fmt); - r = vprintk_emit(0, LOGLEVEL_SCHED, NULL, 0, fmt, args); - va_end(args); - - __this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT); - irq_work_queue(this_cpu_ptr(&wake_up_klogd_work)); - preempt_enable(); - - return r; -} - -/* * printk rate limiting, lifted from the networking subsystem. * * This enforces a rate limit: not more than 10 kernel messages -- 2.8.0.rc0 ^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async 2016-03-07 15:10 ` Sergey Senozhatsky @ 2016-03-07 15:49 ` Tejun Heo 2016-03-08 10:21 ` Sergey Senozhatsky 0 siblings, 1 reply; 28+ messages in thread From: Tejun Heo @ 2016-03-07 15:49 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Jan Kara, Sergey Senozhatsky, Tetsuo Handa, akpm, jack, pmladek, linux-kernel Hello, On Tue, Mar 08, 2016 at 12:10:47AM +0900, Sergey Senozhatsky wrote: > A new version. Switched to [printk] kthread. There are some benefits to using a percpu workqueue with CPU_INTENSIVE set or an unbound workqueue. It'd need WQ_RESCUER so it'd still create a dedicated thread which is used under heavy memory pressure but workqueue will usaully give you local (cpu or node) worker. One reason to use kthread directly tho is minimizing the amount of dependency which is pretty important for printk. If we decide to use kthread instead of workqueue, let's please do it for the right reason. Thanks. -- tejun ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async 2016-03-07 15:49 ` Tejun Heo @ 2016-03-08 10:21 ` Sergey Senozhatsky 2016-03-11 17:22 ` Tejun Heo 0 siblings, 1 reply; 28+ messages in thread From: Sergey Senozhatsky @ 2016-03-08 10:21 UTC (permalink / raw) To: Tejun Heo Cc: Sergey Senozhatsky, Jan Kara, Sergey Senozhatsky, Tetsuo Handa, akpm, jack, pmladek, linux-kernel Hello Tejun, On (03/07/16 10:49), Tejun Heo wrote: > On Tue, Mar 08, 2016 at 12:10:47AM +0900, Sergey Senozhatsky wrote: > > A new version. Switched to [printk] kthread. > > There are some benefits to using a percpu workqueue with CPU_INTENSIVE > set or an unbound workqueue. It'd need WQ_RESCUER so it'd still > create a dedicated thread which is used under heavy memory pressure > but workqueue will usaully give you local (cpu or node) worker. One > reason to use kthread directly tho is minimizing the amount of > dependency which is pretty important for printk. If we decide to use > kthread instead of workqueue, let's please do it for the right reason. thanks. I'd personally prefer to go with the "less dependency" option -- a dedicated kthread, I think. mostly for the sake of simplicity. I agree with the point that console_unlock() has unpredictable execution time, and in general case we would have a busy kworker (or sleeping in console_lock() or doing cond_resched()) and an idle extra WQ_RESCUER kthread, with activation rules that don't depend on printk. printk with dedicated printk-kthread seems easier to control. how does it sound? -ss ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async 2016-03-08 10:21 ` Sergey Senozhatsky @ 2016-03-11 17:22 ` Tejun Heo 2016-03-12 5:01 ` Sergey Senozhatsky 0 siblings, 1 reply; 28+ messages in thread From: Tejun Heo @ 2016-03-11 17:22 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Sergey Senozhatsky, Jan Kara, Tetsuo Handa, akpm, jack, pmladek, linux-kernel Hello, Sergey. On Tue, Mar 08, 2016 at 07:21:52PM +0900, Sergey Senozhatsky wrote: > I'd personally prefer to go with the "less dependency" option -- a dedicated > kthread, I think. mostly for the sake of simplicity. I agree with the point > that console_unlock() has unpredictable execution time, and in general case > we would have a busy kworker (or sleeping in console_lock() or doing > cond_resched()) and an idle extra WQ_RESCUER kthread, with activation rules > that don't depend on printk. printk with dedicated printk-kthread seems > easier to control. how does it sound? I don't think it makes sense to avoid workqueue for execution latency. The only case which can matter is the rescuer case and as I wrote before the system is already in an extremely high latency mode by the time rescuer is needed, so it's unlikely to make noticeable differences. However, I agree that using kthread is a good idea here just to reduce the amount of dependency as prink working even during complex failures is important. workqueue itself is fairly complex and it also requires timer and task creation to work correctly for proper operation. That's a lot of extra dependency. Thanks. -- tejun ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async 2016-03-11 17:22 ` Tejun Heo @ 2016-03-12 5:01 ` Sergey Senozhatsky 0 siblings, 0 replies; 28+ messages in thread From: Sergey Senozhatsky @ 2016-03-12 5:01 UTC (permalink / raw) To: Tejun Heo Cc: Sergey Senozhatsky, Sergey Senozhatsky, Jan Kara, Tetsuo Handa, akpm, jack, pmladek, linux-kernel Hello Tejun, On (03/11/16 12:22), Tejun Heo wrote: > On Tue, Mar 08, 2016 at 07:21:52PM +0900, Sergey Senozhatsky wrote: > > I'd personally prefer to go with the "less dependency" option -- a dedicated > > kthread, I think. mostly for the sake of simplicity. I agree with the point > > that console_unlock() has unpredictable execution time, and in general case > > we would have a busy kworker (or sleeping in console_lock() or doing > > cond_resched()) and an idle extra WQ_RESCUER kthread, with activation rules > > that don't depend on printk. printk with dedicated printk-kthread seems > > easier to control. how does it sound? > > I don't think it makes sense to avoid workqueue for execution latency. > The only case which can matter is the rescuer case and as I wrote > before the system is already in an extremely high latency mode by the > time rescuer is needed, so it's unlikely to make noticeable > differences. > > However, I agree that using kthread is a good idea here just to reduce > the amount of dependency as prink working even during complex failures > is important. workqueue itself is fairly complex and it also requires > timer and task creation to work correctly for proper operation. > That's a lot of extra dependency. Thanks! I agree that, in some cases (if not in most of them) the "value" of printk() output is inversely proportional to the system health -- the worst the state, the more attention people pay to printk() output; so a simpler solution here gives more confidence. -ss ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async 2016-03-07 12:16 ` Jan Kara 2016-03-07 12:37 ` Tetsuo Handa 2016-03-07 15:10 ` Sergey Senozhatsky @ 2016-03-09 6:09 ` Sergey Senozhatsky 2016-03-10 9:27 ` Jan Kara 2016-03-10 9:53 ` Petr Mladek 2 siblings, 2 replies; 28+ messages in thread From: Sergey Senozhatsky @ 2016-03-09 6:09 UTC (permalink / raw) To: Jan Kara Cc: Sergey Senozhatsky, Tetsuo Handa, akpm, jack, pmladek, tj, linux-kernel, sergey.senozhatsky Hello Jan, On (03/07/16 13:16), Jan Kara wrote: [..] > > So if this will be a problem in practice, using a kthread will probably be > > the easiest solution. > > Hum, and thinking more about it: Considering that WQ_MEM_RECLAIM workqueues > create kthread anyway as a rescuer thread, it may be the simplest to just > go back to using a single kthread for printing. What do you think? I have this patch on top of the series now. In short, it closes one more possibility of lockups -- console_lock()/console_unlock() calls. the patch splits console_unlock() in two parts: -- the fast one just wake up printing kthread -- the slow one does call_console_drivers() loop I think it sort of makes sense to tweak the patch below a bit and fold it into 0001, and move _some_ of the vprintk_emit() checks to console_unlock(). very schematically, after folding, vprintk_emit() will be if (in_sched) { if (!printk_sync && printk_thread) wake_up() else irq_work_queue() } if (!in_sched) if (console_trylock()) console_unlock() and console_unlock() will be if (!in_panic && !printk_sync && printk_thread) { up_console_sem() wake_up() } else { console_unlock_for_printk() } console_unlock_for_printk() does the call_console_drivers() loop. console_flush_on_panic() and printing_func() call console_unlock_for_printk() directly. What do you think? Or would you prefer to first introduce async printk() rework, and move to console_unlock() in vprintk_emit() one release cycle later? IOW, in 3 steps: -- first make printk() async -- then console_unlock() async, and use console_unlock_for_printk() in vprintk_emit() -- then switch to console_unlock() in vprintk_emit(). below is the patch which introduces console_unlock_for_printk(). not the squashed console_unlock_for_printk() and 0001. -ss ====== >From bc3932c68c5afb9bf43af98335c705c75067a93a Mon Sep 17 00:00:00 2001 From: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Subject: [PATCH 3/4] printk: introduce console_unlock_for_printk() Even though we already have asynchronous printk()->vprintk_emit(), there are still good chances to get lockups, because we don't have asynchronous console_unlock(). So any process doing console_lock() and console_unlock() will end up looping in console_unlock(), pushing the messages to console drivers (possibly with IRQs or preemption disabled), regardless the fact that we have a dedicated kthread for that particular job. Apart from that, console_lock()/console_unlock() can be executed by user processes as a part of system calls: a) SyS_open() ... chrdev_open() tty_open() console_device() console_lock() console_unlock() for (;;) { call_console_drivers() } b) SyS_read() ... sysfs_read_file() dev_attr_show() show_cons_active() console_lock() console_unlock() for (;;) { call_console_drivers() } c) doing `cat /proc/consoles` SyS_read() vfs_read() proc_reg_read() seq_read() c_stop() console_unlock() for (;;) { call_console_drivers() } etc. This can add unnecessary latencies to the user space processes. This patch splits console_unlock() in two parts: -- the fast path up() console semaphore and wake up printing kthread (if there is one, of course), otherwise -- the slow path: does what console_unlock() did previously, emit the messages and then up() console semaphore The actual printing loop is, thus, moved to a new function, console_unlock_for_printk(). There are 3 places that unconditionally call it: a) direct printing from vprintk_emit() b) console_flush_on_panic() c) printing kthread callback Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> --- kernel/printk/printk.c | 51 +++++++++++++++++++++++++++++++++++++++++++------- 1 file changed, 44 insertions(+), 7 deletions(-) diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index de45d86..ddaf62e 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -303,6 +303,8 @@ static struct task_struct *printk_thread; /* Wait for printing wakeups from async vprintk_emit() */ static DECLARE_WAIT_QUEUE_HEAD(printing_wait); +static void console_unlock_for_printk(void); + static int printing_func(void *data) { while (1) { @@ -314,7 +316,7 @@ static int printing_func(void *data) remove_wait_queue(&printing_wait, &wait); console_lock(); - console_unlock(); + console_unlock_for_printk(); } return 0; @@ -1900,7 +1902,7 @@ asmlinkage int vprintk_emit(int facility, int level, * /dev/kmsg and syslog() users. */ if (console_trylock()) - console_unlock(); + console_unlock_for_printk(); lockdep_on(); } @@ -2339,20 +2341,20 @@ out: #define PRINT_MSGS_BEFORE_OOPS 100 /** - * console_unlock - unlock the console system + * console_unlock_for_printk - unlock the console system * * Releases the console_lock which the caller holds on the console system * and the console driver list. * * While the console_lock was held, console output may have been buffered - * by printk(). If this is the case, console_unlock(); emits + * by printk(). If this is the case, console_unlock_for_printk() emits * the output prior to releasing the lock. * * If there is output waiting, we wake /dev/kmsg and syslog() users. * - * console_unlock(); may be called from any context. + * console_unlock_for_printk() may be called from any context. */ -void console_unlock(void) +static void console_unlock_for_printk(void) { static char ext_text[CONSOLE_EXT_LOG_MAX]; static char text[LOG_LINE_MAX + PREFIX_MAX]; @@ -2511,6 +2513,41 @@ skip: if (wake_klogd) wake_up_klogd(); } + + +/** + * console_unlock - unlock the console system + * + * Releases the console_lock which the caller holds on the console system. + * + * The fast path is to wake up the printing kthread (if the system can + * perform asynchronous printing) and return; the slow path is to emit + * the messages directly invoking console_unlock_for_printk(). + * + * console_unlock() may be called from any context. + */ +void console_unlock(void) +{ + bool in_panic = console_loglevel == CONSOLE_LOGLEVEL_MOTORMOUTH; + + if (in_panic) { + /* + * If the system is in panic console_flush_on_panic() issued + * from panic_cpu will flush the messages. + */ + console_locked = 0; + up_console_sem(); + return; + } + + if (!printk_sync && printk_thread) { + console_locked = 0; + up_console_sem(); + wake_up(&printing_wait); + } else { + console_unlock_for_printk(); + } +} EXPORT_SYMBOL(console_unlock); /** @@ -2567,7 +2604,7 @@ void console_flush_on_panic(void) */ console_trylock(); console_may_schedule = 0; - console_unlock(); + console_unlock_for_printk(); } /* -- 2.8.0.rc0.1.gd285ab0 ^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async 2016-03-09 6:09 ` Sergey Senozhatsky @ 2016-03-10 9:27 ` Jan Kara 2016-03-10 15:48 ` Sergey Senozhatsky 2016-03-10 9:53 ` Petr Mladek 1 sibling, 1 reply; 28+ messages in thread From: Jan Kara @ 2016-03-10 9:27 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Jan Kara, Tetsuo Handa, akpm, jack, pmladek, tj, linux-kernel, sergey.senozhatsky On Wed 09-03-16 15:09:50, Sergey Senozhatsky wrote: > Hello Jan, > > On (03/07/16 13:16), Jan Kara wrote: > [..] > > > So if this will be a problem in practice, using a kthread will probably be > > > the easiest solution. > > > > Hum, and thinking more about it: Considering that WQ_MEM_RECLAIM workqueues > > create kthread anyway as a rescuer thread, it may be the simplest to just > > go back to using a single kthread for printing. What do you think? > > I have this patch on top of the series now. In short, it closes one more > possibility of lockups -- console_lock()/console_unlock() calls. the patch > splits console_unlock() in two parts: > -- the fast one just wake up printing kthread > -- the slow one does call_console_drivers() loop > > I think it sort of makes sense to tweak the patch below a bit and fold it > into 0001, and move _some_ of the vprintk_emit() checks to console_unlock(). > > very schematically, after folding, vprintk_emit() will be > > if (in_sched) { > if (!printk_sync && printk_thread) > wake_up() > else > irq_work_queue() > } > > if (!in_sched) > if (console_trylock()) > console_unlock() > > and console_unlock() will be > > if (!in_panic && !printk_sync && printk_thread) { > up_console_sem() > wake_up() > } else { > console_unlock_for_printk() > } > > console_unlock_for_printk() does the call_console_drivers() loop. > > console_flush_on_panic() and printing_func() call console_unlock_for_printk() > directly. > > > What do you think? Or would you prefer to first introduce async > printk() rework, and move to console_unlock() in vprintk_emit() > one release cycle later? > IOW, in 3 steps: > -- first make printk() async > -- then console_unlock() async, and use console_unlock_for_printk() in > vprintk_emit() > -- then switch to console_unlock() in vprintk_emit(). > > > below is the patch which introduces console_unlock_for_printk(). > not the squashed console_unlock_for_printk() and 0001. So I think this should definitely stay as a separate patch since it possibly changes user visible behavior and sometimes blocking may be actually desirable for userspace. I don't have that strong opinion whether it should be in a separate patch set or part of this one. Maybe a separate patch set would be somewhat better so that we first hash out possible issues with async printk and once that's settled we start messing more with the code changing the behavior of console_unlock() as well. Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async 2016-03-10 9:27 ` Jan Kara @ 2016-03-10 15:48 ` Sergey Senozhatsky 0 siblings, 0 replies; 28+ messages in thread From: Sergey Senozhatsky @ 2016-03-10 15:48 UTC (permalink / raw) To: Jan Kara Cc: Sergey Senozhatsky, Tetsuo Handa, akpm, jack, pmladek, tj, linux-kernel, sergey.senozhatsky On (03/10/16 10:27), Jan Kara wrote: [..] > So I think this should definitely stay as a separate patch since it > possibly changes user visible behavior and sometimes blocking may be > actually desirable for userspace. I don't have that strong opinion whether > it should be in a separate patch set or part of this one. Maybe a separate > patch set would be somewhat better so that we first hash out possible issues > with async printk and once that's settled we start messing more with the > code changing the behavior of console_unlock() as well. agree, thanks. I'll split the series and submit console_unlock() rework separately. -ss ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async 2016-03-09 6:09 ` Sergey Senozhatsky 2016-03-10 9:27 ` Jan Kara @ 2016-03-10 9:53 ` Petr Mladek 2016-03-10 16:26 ` Sergey Senozhatsky 1 sibling, 1 reply; 28+ messages in thread From: Petr Mladek @ 2016-03-10 9:53 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Jan Kara, Tetsuo Handa, akpm, jack, tj, linux-kernel, sergey.senozhatsky On Wed 2016-03-09 15:09:50, Sergey Senozhatsky wrote: > On (03/07/16 13:16), Jan Kara wrote: > What do you think? Or would you prefer to first introduce async > printk() rework, and move to console_unlock() in vprintk_emit() > one release cycle later? > IOW, in 3 steps: > -- first make printk() async > -- then console_unlock() async, and use console_unlock_for_printk() in > vprintk_emit() > > -- then switch to console_unlock() in vprintk_emit(). I would sort this by priorities. I know about real-world problems that will get solved by async printk. I haven't heard yet people complaining about blocked console_lock()/console_unlock() calls outside printk code. So, I would personally prefer to handle async printk first. Heh, you opened an interesting can of worms. There are definitely locations that just want to manipulate the list of consoles and their setting without the need to push the date. I wonder how many locations really need to push the data. Note that console_unlock_for_printk() might be a bit misleading. Especially when you suggest to replace it by console_unlock() in vprintk_emit() ;-) I wonder if console_flush_and_unlock() might be more descriptive. We might even split flush_console() into a separate function in the end. I think that the combination with unlock() is there to make sure that somebody will flush the last messages from printk(), see the retry stuff. It probably won't be needed with the asynch printk(). Anyway, all these console_unlock() changes looks like another big step and I suggest to do it separately. I looking forward to see where this ends. Thanks, Petr PS: I want to check more precisely the async printk patchset but I am repeatedy sidetracked this week :-( ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async 2016-03-10 9:53 ` Petr Mladek @ 2016-03-10 16:26 ` Sergey Senozhatsky 0 siblings, 0 replies; 28+ messages in thread From: Sergey Senozhatsky @ 2016-03-10 16:26 UTC (permalink / raw) To: Petr Mladek Cc: Sergey Senozhatsky, Jan Kara, Tetsuo Handa, akpm, jack, tj, linux-kernel, sergey.senozhatsky On (03/10/16 10:53), Petr Mladek wrote: [..] > On Wed 2016-03-09 15:09:50, Sergey Senozhatsky wrote: > > On (03/07/16 13:16), Jan Kara wrote: > > What do you think? Or would you prefer to first introduce async > > printk() rework, and move to console_unlock() in vprintk_emit() > > one release cycle later? > > IOW, in 3 steps: > > -- first make printk() async > > -- then console_unlock() async, and use console_unlock_for_printk() in > > vprintk_emit() > > > > -- then switch to console_unlock() in vprintk_emit(). > > I would sort this by priorities. I agree, let's settle down async printk() first. > I know about real-world problems that will get solved by > async printk. I haven't heard yet people complaining about > blocked console_lock()/console_unlock() calls outside printk > code. So, I would personally prefer to handle async printk > first. well, I see some problems with console_lock()/console_unlock() :) > Heh, you opened an interesting can of worms. There are definitely > locations that just want to manipulate the list of consoles and > their setting without the need to push the date. I wonder how > many locations really need to push the data. I've tested it briefly on some of the setups that I have around, and the boot time reduced by (very roughly) ~20+%; systemd and friends do a number of tty/etc. calls, and stuck in console_unock() each time. of course, the "pre-condition" here are printk()s from drivers/etc. (frequent enough to keep call_console_drivers() busy, not necessarily "pressure"). even on my laptop, userspace does a ton of console_unlock() ... [<ffffffff8108a904>] console_unlock+0x24/0x89 [<ffffffff8108ba76>] console_device+0x4a/0x54 [<ffffffff81261fbb>] tty_open+0x127/0x4c5 [<ffffffff81145316>] chrdev_open+0x13f/0x164 [<ffffffff811451d7>] ? cdev_put+0x23/0x23 [<ffffffff8113fc88>] do_dentry_open.isra.1+0x1b3/0x29e [<ffffffff81140791>] vfs_open+0x53/0x58 [<ffffffff8114e40d>] path_openat+0xa37/0xc8c [<ffffffff8114f2a2>] do_filp_open+0x4d/0xa3 [<ffffffff8115c1f2>] ? __alloc_fd+0x1ae/0x1c0 [<ffffffff813acdff>] ? _raw_spin_unlock+0x27/0x31 [<ffffffff81140a8b>] do_sys_open+0x13c/0x1cc [<ffffffff81140a8b>] ? do_sys_open+0x13c/0x1cc [<ffffffff81140b39>] SyS_open+0x1e/0x20 [<ffffffff813ad4a5>] entry_SYSCALL_64_fastpath+0x18/0xa8 ____fput()->console_unlock() [<ffffffff8108a904>] console_unlock+0x24/0x89 [<ffffffff81271910>] con_shutdown+0x2d/0x30 [<ffffffff8125e99d>] release_tty+0x52/0x12e [<ffffffff81260722>] tty_release+0x436/0x453 [<ffffffff81142bb3>] __fput+0x107/0x1ba [<ffffffff81142c9c>] ____fput+0xe/0x10 [<ffffffff8105d504>] task_work_run+0x67/0x90 [<ffffffff810011cc>] exit_to_usermode_loop+0x66/0x84 [<ffffffff8100179c>] syscall_return_slowpath+0x8d/0x92 [<ffffffff813ad533>] entry_SYSCALL_64_fastpath+0xa6/0xa8 ... etc. > Note that console_unlock_for_printk() might be a bit > misleading. Especially when you suggest to replace it by > console_unlock() in vprintk_emit() ;-) I wonder if > console_flush_and_unlock() might be more descriptive. oh, yes, the function name was absolutely random. console_flush_and_unlock() looks good. > We might even split flush_console() into a separate function in the end. > I think that the combination with unlock() is there to make sure > that somebody will flush the last messages from printk(), see > the retry stuff. It probably won't be needed with the asynch printk(). > > Anyway, all these console_unlock() changes looks like another big step > and I suggest to do it separately. ok. > PS: I want to check more precisely the async printk patchset but > I am repeatedy sidetracked this week :-( no prob! it's a pre-merge period, no pressure. I'll re-spin the printk() patch tomorrow, I think. async console_unlock() will be separated. -ss ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async 2016-03-07 10:52 ` Jan Kara 2016-03-07 12:16 ` Jan Kara @ 2016-03-07 14:40 ` Sergey Senozhatsky 1 sibling, 0 replies; 28+ messages in thread From: Sergey Senozhatsky @ 2016-03-07 14:40 UTC (permalink / raw) To: Jan Kara Cc: Sergey Senozhatsky, Tetsuo Handa, akpm, jack, pmladek, tj, linux-kernel, sergey.senozhatsky On (03/07/16 11:52), Jan Kara wrote: [..] > So for UP systems, we should by default disable async printing anyway I > suppose. It is just a pointless overhead. So please just make printk_sync > default to true if !CONFIG_SMP. ok, thanks. > When IRQs are disabled, you're right we will have a change in behavior. I > don't see an easy way of avoiding delaying of printk until IRQs get > enabled. I don't want to queue work directly because that creates > possibility for lock recursion in queue_work(). And playing some tricks > with irq_works isn't easy either - you cannot actually rely on any other > CPU doing anything (even a timer tick) because of NOHZ. yes. I thought about some sort of PRINTK_IPI on smp, and queue work from PRINTK_IPI. which is a bit insane. > So if this will be a problem in practice, using a kthread will probably be > the easiest solution. probably kthread is the way to go then. Tetsuo's bad experience with workqueues also sounds a bit alarming. I'll post a new patch soon. -ss ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async 2016-03-07 10:12 ` Sergey Senozhatsky 2016-03-07 10:52 ` Jan Kara @ 2016-03-07 11:10 ` Tetsuo Handa 2016-03-07 14:36 ` Sergey Senozhatsky 1 sibling, 1 reply; 28+ messages in thread From: Tetsuo Handa @ 2016-03-07 11:10 UTC (permalink / raw) To: sergey.senozhatsky.work Cc: akpm, jack, pmladek, tj, linux-kernel, sergey.senozhatsky Sergey Senozhatsky wrote: > Hello, > > On (03/07/16 09:22), Jan Kara wrote: > [..] > > > hm, just for note, none of system-wide wqs seem to have a ->rescuer thread > > > (WQ_MEM_RECLAIM). > > > > > > [..] > > > > Even if you use printk_wq with WQ_MEM_RECLAIM for printing_work work item, > > > > printing_work_func() will not be called until current work item calls > > > > schedule_timeout_*(). That will be an undesirable random delay. If you use > > > > a dedicated kernel thread rather than a dedicated workqueue with WQ_MEM_RECLAIM, > > > > we can avoid this random delay. > > > > > > hm. yes, seems that it may take some time until workqueue wakeup() a ->rescuer thread. > > > need to look more. > > > > Yes, it takes some time (0.1s or 2 jiffies) before workqueue code gives up > > creating a worker process and wakes up rescuer thread. However I don't see > > that as a problem... > > yes, that's why I asked Tetsuo whether his concern was a wq's MAYDAY timer > delay. the two commits that Tetsuo pointed at earlier in he loop (373ccbe59270 > and 564e81a57f97) solved the problem by switching to WQ_MEM_RECLAIM wq. > I've slightly tested OOM-kill on my desktop system and haven't spotted any > printk delays (well, a test on desktop is not really representative, of > course). I wanted to tell that if kworker is running a buggy function that calls cond_resched() but does not call schedule_timeout_*() for very long time, such delay can become many seconds. WQ_MEM_RECLAIM is a requirement for waking up when kworker called schedule_timeout_*(). WQ_MEM_RECLAIM wq can still cause huge delay if kworker does not call schedule_timeout_*(). Not specific to OOM-killer or vmstat. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async 2016-03-07 11:10 ` Tetsuo Handa @ 2016-03-07 14:36 ` Sergey Senozhatsky 0 siblings, 0 replies; 28+ messages in thread From: Sergey Senozhatsky @ 2016-03-07 14:36 UTC (permalink / raw) To: Tetsuo Handa Cc: sergey.senozhatsky.work, akpm, jack, pmladek, tj, linux-kernel, sergey.senozhatsky On (03/07/16 20:10), Tetsuo Handa wrote: [..] > > > > hm. yes, seems that it may take some time until workqueue wakeup() a ->rescuer thread. > > > > need to look more. > > > > > > Yes, it takes some time (0.1s or 2 jiffies) before workqueue code gives up > > > creating a worker process and wakes up rescuer thread. However I don't see > > > that as a problem... > > > > yes, that's why I asked Tetsuo whether his concern was a wq's MAYDAY timer > > delay. the two commits that Tetsuo pointed at earlier in he loop (373ccbe59270 > > and 564e81a57f97) solved the problem by switching to WQ_MEM_RECLAIM wq. > > I've slightly tested OOM-kill on my desktop system and haven't spotted any > > printk delays (well, a test on desktop is not really representative, of > > course). > > I wanted to tell that if kworker is running a buggy function that calls > cond_resched() but does not call schedule_timeout_*() for very long time, > such delay can become many seconds. WQ_MEM_RECLAIM is a requirement for > waking up when kworker called schedule_timeout_*(). WQ_MEM_RECLAIM wq can > still cause huge delay if kworker does not call schedule_timeout_*(). > Not specific to OOM-killer or vmstat. your point is taken. thanks. -ss ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC][PATCH v2 1/2] printk: Make printk() completely async 2016-03-07 8:22 ` Jan Kara 2016-03-07 10:12 ` Sergey Senozhatsky @ 2016-03-07 15:42 ` Tejun Heo 1 sibling, 0 replies; 28+ messages in thread From: Tejun Heo @ 2016-03-07 15:42 UTC (permalink / raw) To: Jan Kara Cc: Sergey Senozhatsky, Tetsuo Handa, akpm, jack, pmladek, linux-kernel, sergey.senozhatsky Hello, On Mon, Mar 07, 2016 at 09:22:30AM +0100, Jan Kara wrote: > > > I don't know what MAYDAY is. I'm talking about a situation where printing_work > > > work item is not processed (i.e. printing_work_func() is not called) until > > > current work item calls schedule_timeout_*(). That was because the work item was percpu and not marked CPU_INTENSIVE. Either using an unbound or CPU_INTENSIVE workqueue should be enough. > > > We had a problem that since vmstat_work work item was using system_wq, > > > vmstat_work work item was not processed (i.e. vmstat_update() was not called) if > > > kworker was looping inside memory allocator without calling schedule_timeout_*() > > > due to disk_events_workfn() doing GFP_NOIO allocation). > > > > hm, just for note, none of system-wide wqs seem to have a ->rescuer thread > > (WQ_MEM_RECLAIM). Because WQ_MEM_RECLAIM only guarantees concurrency of 1, it doesn't make sense to set it to a shared workqueue. A dedicated workquee should be created per domain which needs forward progress guarantee. > > hm. yes, seems that it may take some time until workqueue wakeup() a ->rescuer thread. > > need to look more. > > Yes, it takes some time (0.1s or 2 jiffies) before workqueue code gives up > creating a worker process and wakes up rescuer thread. However I don't see > that as a problem... I don't think it matters. At that point, the system should already be thrashing heavily and everything is crawling anyway. A couple jiffies delay isn't gonna be noticeable. Thanks. -- tejun ^ permalink raw reply [flat|nested] 28+ messages in thread
* [RFC][PATCH v2 2/2] printk: Skip messages on oops 2016-03-05 10:55 [RFC][PATCH v2 0/2] printk: Make printk() completely async Sergey Senozhatsky 2016-03-05 10:55 ` [RFC][PATCH v2 1/2] " Sergey Senozhatsky @ 2016-03-05 10:55 ` Sergey Senozhatsky 1 sibling, 0 replies; 28+ messages in thread From: Sergey Senozhatsky @ 2016-03-05 10:55 UTC (permalink / raw) To: Andrew Morton Cc: Jan Kara, Petr Mladek, Tejun Heo, Tetsuo Handa, linux-kernel, Sergey Senozhatsky, Sergey Senozhatsky, Jan Kara From: Jan Kara <jack@suse.cz> When there are too many messages in the kernel printk buffer it can take very long to print them to console (especially when using slow serial console). This is undesirable during oops so when we encounter oops and there are more than 100 messages to print, print just the newest 100 messages and then the oops message. [ss: fix tiny config build] Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> --- kernel/printk/printk.c | 36 +++++++++++++++++++++++++++++++++++- 1 file changed, 35 insertions(+), 1 deletion(-) diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index 843f3fd..0bda01e 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -268,6 +268,9 @@ static u64 console_seq; static u32 console_idx; static enum log_flags console_prev; +/* current record sequence when oops happened */ +static u64 oops_start_seq; + /* the next printk record to read after the last 'clear' command */ static u64 clear_seq; static u32 clear_idx; @@ -1773,6 +1776,9 @@ asmlinkage int vprintk_emit(int facility, int level, strlen(recursion_msg)); } + if (oops_in_progress && !oops_start_seq) + oops_start_seq = log_next_seq; + /* * The printf needs to come first; we need the syslog * prefix which might be passed-in as a parameter. @@ -2001,6 +2007,7 @@ static u64 syslog_seq; static u32 syslog_idx; static u64 console_seq; static u32 console_idx; +static u64 oops_start_seq; static enum log_flags syslog_prev; static u64 log_first_seq; static u32 log_first_idx; @@ -2326,6 +2333,12 @@ out: raw_spin_unlock_irqrestore(&logbuf_lock, flags); } +/* + * When oops happens and there are more messages to be printed in the printk + * buffer that this, skip some mesages and print only this many newest messages. + */ +#define PRINT_MSGS_BEFORE_OOPS 100 + /** * console_unlock - unlock the console system * @@ -2394,7 +2407,28 @@ again: seen_seq = log_next_seq; } - if (console_seq < log_first_seq) { + /* + * If oops happened and there are more than + * PRINT_MSGS_BEFORE_OOPS messages pending before oops message, + * skip them to make oops appear faster. + */ + if (oops_start_seq && + console_seq + PRINT_MSGS_BEFORE_OOPS < oops_start_seq) { + len = sprintf(text, + "** %u printk messages dropped due to oops ** ", + (unsigned)(oops_start_seq - console_seq - + PRINT_MSGS_BEFORE_OOPS)); + if (console_seq < log_first_seq) { + console_seq = log_first_seq; + console_idx = log_first_idx; + } + while (console_seq < + oops_start_seq - PRINT_MSGS_BEFORE_OOPS) { + console_idx = log_next(console_idx); + console_seq++; + } + console_prev = 0; + } else if (console_seq < log_first_seq) { len = sprintf(text, "** %u printk messages dropped ** ", (unsigned)(log_first_seq - console_seq)); -- 2.8.0.rc0 ^ permalink raw reply related [flat|nested] 28+ messages in thread
end of thread, other threads:[~2016-03-12 5:03 UTC | newest] Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-03-05 10:55 [RFC][PATCH v2 0/2] printk: Make printk() completely async Sergey Senozhatsky 2016-03-05 10:55 ` [RFC][PATCH v2 1/2] " Sergey Senozhatsky 2016-03-06 6:32 ` Sergey Senozhatsky 2016-03-06 7:18 ` Tetsuo Handa 2016-03-06 9:35 ` Sergey Senozhatsky 2016-03-06 11:06 ` Tetsuo Handa 2016-03-06 13:27 ` Sergey Senozhatsky 2016-03-06 14:54 ` Tetsuo Handa 2016-03-07 8:22 ` Jan Kara 2016-03-07 10:12 ` Sergey Senozhatsky 2016-03-07 10:52 ` Jan Kara 2016-03-07 12:16 ` Jan Kara 2016-03-07 12:37 ` Tetsuo Handa 2016-03-07 15:10 ` Sergey Senozhatsky 2016-03-07 15:49 ` Tejun Heo 2016-03-08 10:21 ` Sergey Senozhatsky 2016-03-11 17:22 ` Tejun Heo 2016-03-12 5:01 ` Sergey Senozhatsky 2016-03-09 6:09 ` Sergey Senozhatsky 2016-03-10 9:27 ` Jan Kara 2016-03-10 15:48 ` Sergey Senozhatsky 2016-03-10 9:53 ` Petr Mladek 2016-03-10 16:26 ` Sergey Senozhatsky 2016-03-07 14:40 ` Sergey Senozhatsky 2016-03-07 11:10 ` Tetsuo Handa 2016-03-07 14:36 ` Sergey Senozhatsky 2016-03-07 15:42 ` Tejun Heo 2016-03-05 10:55 ` [RFC][PATCH v2 2/2] printk: Skip messages on oops Sergey Senozhatsky
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.