* [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
@ 2018-01-10 13:24 Petr Mladek
2018-01-10 13:24 ` [PATCH v5 1/2] printk: Add console owner and waiter logic to load balance console writes Petr Mladek
` (2 more replies)
0 siblings, 3 replies; 140+ messages in thread
From: Petr Mladek @ 2018-01-10 13:24 UTC (permalink / raw)
To: Steven Rostedt, Sergey Senozhatsky
Cc: akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner,
Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra,
Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa,
rostedt, Byungchul Park, Sergey Senozhatsky, Tejun Heo,
Pavel Machek, linux-kernel, Petr Mladek
This is the last version of Steven's console owner/waiter logic.
Plus my proposal to hide it into 3 helper functions. It is supposed
to keep the code maintenable.
The handshake really works. It happens about 10-times even during
boot of a simple system in qemu with a fast console here. It is
definitely able to avoid some softlockups. Let's see if it is
enough in practice.
>From my point of view, it is ready to go into linux-next so that
it can get some more test coverage.
Steven's patch is the v4, see
https://lkml.kernel.org/r/20171108102723.602216b1@gandalf.local.home
Petr Mladek (1):
printk: Hide console waiter logic into helpers
Steven Rostedt (1):
printk: Add console owner and waiter logic to load balance console
writes
kernel/printk/printk.c | 156 ++++++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 155 insertions(+), 1 deletion(-)
--
2.13.6
^ permalink raw reply [flat|nested] 140+ messages in thread
* [PATCH v5 1/2] printk: Add console owner and waiter logic to load balance console writes 2018-01-10 13:24 [PATCH v5 0/2] printk: Console owner and waiter logic cleanup Petr Mladek @ 2018-01-10 13:24 ` Petr Mladek 2018-01-10 16:50 ` Steven Rostedt ` (2 more replies) 2018-01-10 13:24 ` [PATCH v5 2/2] printk: Hide console waiter logic into helpers Petr Mladek 2018-01-10 14:05 ` [PATCH v5 0/2] printk: Console owner and waiter logic cleanup Tejun Heo 2 siblings, 3 replies; 140+ messages in thread From: Petr Mladek @ 2018-01-10 13:24 UTC (permalink / raw) To: Steven Rostedt, Sergey Senozhatsky Cc: akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Sergey Senozhatsky, Tejun Heo, Pavel Machek, linux-kernel From: Steven Rostedt <rostedt@goodmis.org> From: Steven Rostedt (VMware) <rostedt@goodmis.org> This patch implements what I discussed in Kernel Summit. I added lockdep annotation (hopefully correctly), and it hasn't had any splats (since I fixed some bugs in the first iterations). It did catch problems when I had the owner covering too much. But now that the owner is only set when actively calling the consoles, lockdep has stayed quiet. Here's the design again: I added a "console_owner" which is set to a task that is actively writing to the consoles. It is *not* the same as the owner of the console_lock. It is only set when doing the calls to the console functions. It is protected by a console_owner_lock which is a raw spin lock. There is a console_waiter. This is set when there is an active console owner that is not current, and waiter is not set. This too is protected by console_owner_lock. In printk() when it tries to write to the consoles, we have: if (console_trylock()) console_unlock(); Now I added an else, which will check if there is an active owner, and no current waiter. If that is the case, then console_waiter is set, and the task goes into a spin until it is no longer set. When the active console owner finishes writing the current message to the consoles, it grabs the console_owner_lock and sees if there is a waiter, and clears console_owner. If there is a waiter, then it breaks out of the loop, clears the waiter flag (because that will release the waiter from its spin), and exits. Note, it does *not* release the console semaphore. Because it is a semaphore, there is no owner. Another task may release it. This means that the waiter is guaranteed to be the new console owner! Which it becomes. Then the waiter calls console_unlock() and continues to write to the consoles. If another task comes along and does a printk() it too can become the new waiter, and we wash rinse and repeat! By Petr Mladek about possible new deadlocks: The thing is that we move console_sem only to printk() call that normally calls console_unlock() as well. It means that the transferred owner should not bring new type of dependencies. As Steven said somewhere: "If there is a deadlock, it was there even before." We could look at it from this side. The possible deadlock would look like: CPU0 CPU1 console_unlock() console_owner = current; spin_lockA() printk() spin = true; while (...) call_console_drivers() spin_lockA() This would be a deadlock. CPU0 would wait for the lock A. While CPU1 would own the lockA and would wait for CPU0 to finish calling the console drivers and pass the console_sem owner. But if the above is true than the following scenario was already possible before: CPU0 spin_lockA() printk() console_unlock() call_console_drivers() spin_lockA() By other words, this deadlock was there even before. Such deadlocks are prevented by using printk_deferred() in the sections guarded by the lock A. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> [pmladek@suse.com: Commit message about possible deadlocks] --- kernel/printk/printk.c | 108 ++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 107 insertions(+), 1 deletion(-) diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index b9006617710f..7e6459abba43 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -86,8 +86,15 @@ EXPORT_SYMBOL_GPL(console_drivers); static struct lockdep_map console_lock_dep_map = { .name = "console_lock" }; +static struct lockdep_map console_owner_dep_map = { + .name = "console_owner" +}; #endif +static DEFINE_RAW_SPINLOCK(console_owner_lock); +static struct task_struct *console_owner; +static bool console_waiter; + enum devkmsg_log_bits { __DEVKMSG_LOG_BIT_ON = 0, __DEVKMSG_LOG_BIT_OFF, @@ -1753,8 +1760,56 @@ asmlinkage int vprintk_emit(int facility, int level, * semaphore. The release will print out buffers and wake up * /dev/kmsg and syslog() users. */ - if (console_trylock()) + if (console_trylock()) { console_unlock(); + } else { + struct task_struct *owner = NULL; + bool waiter; + bool spin = false; + + printk_safe_enter_irqsave(flags); + + raw_spin_lock(&console_owner_lock); + owner = READ_ONCE(console_owner); + waiter = READ_ONCE(console_waiter); + if (!waiter && owner && owner != current) { + WRITE_ONCE(console_waiter, true); + spin = true; + } + raw_spin_unlock(&console_owner_lock); + + /* + * If there is an active printk() writing to the + * consoles, instead of having it write our data too, + * see if we can offload that load from the active + * printer, and do some printing ourselves. + * Go into a spin only if there isn't already a waiter + * spinning, and there is an active printer, and + * that active printer isn't us (recursive printk?). + */ + if (spin) { + /* We spin waiting for the owner to release us */ + spin_acquire(&console_owner_dep_map, 0, 0, _THIS_IP_); + /* Owner will clear console_waiter on hand off */ + while (READ_ONCE(console_waiter)) + cpu_relax(); + + spin_release(&console_owner_dep_map, 1, _THIS_IP_); + printk_safe_exit_irqrestore(flags); + + /* + * The owner passed the console lock to us. + * Since we did not spin on console lock, annotate + * this as a trylock. Otherwise lockdep will + * complain. + */ + mutex_acquire(&console_lock_dep_map, 0, 1, _THIS_IP_); + console_unlock(); + printk_safe_enter_irqsave(flags); + } + printk_safe_exit_irqrestore(flags); + + } } return printed_len; @@ -2141,6 +2196,7 @@ void console_unlock(void) static u64 seen_seq; unsigned long flags; bool wake_klogd = false; + bool waiter = false; bool do_cond_resched, retry; if (console_suspended) { @@ -2229,14 +2285,64 @@ void console_unlock(void) console_seq++; raw_spin_unlock(&logbuf_lock); + /* + * While actively printing out messages, if another printk() + * were to occur on another CPU, it may wait for this one to + * finish. This task can not be preempted if there is a + * waiter waiting to take over. + */ + raw_spin_lock(&console_owner_lock); + console_owner = current; + raw_spin_unlock(&console_owner_lock); + + /* The waiter may spin on us after setting console_owner */ + spin_acquire(&console_owner_dep_map, 0, 0, _THIS_IP_); + stop_critical_timings(); /* don't trace print latency */ call_console_drivers(ext_text, ext_len, text, len); start_critical_timings(); + + raw_spin_lock(&console_owner_lock); + waiter = READ_ONCE(console_waiter); + console_owner = NULL; + raw_spin_unlock(&console_owner_lock); + + /* + * If there is a waiter waiting for us, then pass the + * rest of the work load over to that waiter. + */ + if (waiter) + break; + + /* There was no waiter, and nothing will spin on us here */ + spin_release(&console_owner_dep_map, 1, _THIS_IP_); + printk_safe_exit_irqrestore(flags); if (do_cond_resched) cond_resched(); } + + /* + * If there is an active waiter waiting on the console_lock. + * Pass off the printing to the waiter, and the waiter + * will continue printing on its CPU, and when all writing + * has finished, the last printer will wake up klogd. + */ + if (waiter) { + WRITE_ONCE(console_waiter, false); + /* The waiter is now free to continue */ + spin_release(&console_owner_dep_map, 1, _THIS_IP_); + /* + * Hand off console_lock to waiter. The waiter will perform + * the up(). After this, the waiter is the console_lock owner. + */ + mutex_release(&console_lock_dep_map, 1, _THIS_IP_); + printk_safe_exit_irqrestore(flags); + /* Note, if waiter is set, logbuf_lock is not held */ + return; + } + console_locked = 0; /* Release the exclusive_console once it is used */ -- 2.13.6 ^ permalink raw reply related [flat|nested] 140+ messages in thread
* Re: [PATCH v5 1/2] printk: Add console owner and waiter logic to load balance console writes 2018-01-10 13:24 ` [PATCH v5 1/2] printk: Add console owner and waiter logic to load balance console writes Petr Mladek @ 2018-01-10 16:50 ` Steven Rostedt 2018-01-12 16:54 ` Steven Rostedt 2018-01-17 2:19 ` Byungchul Park 2 siblings, 0 replies; 140+ messages in thread From: Steven Rostedt @ 2018-01-10 16:50 UTC (permalink / raw) To: Petr Mladek Cc: Sergey Senozhatsky, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Sergey Senozhatsky, Tejun Heo, Pavel Machek, linux-kernel On Wed, 10 Jan 2018 14:24:17 +0100 Petr Mladek <pmladek@suse.com> wrote: > From: Steven Rostedt <rostedt@goodmis.org> Please remove the above From:, it will overwrite the one below which I would prefer to have. Thanks! -- Steve > > From: Steven Rostedt (VMware) <rostedt@goodmis.org> > > This patch implements what I discussed in Kernel Summit. I added > lockdep annotation (hopefully correctly), and it hasn't had any splats > (since I fixed some bugs in the first iterations). It did catch > problems when I had the owner covering too much. But now that the owner > is only set when actively calling the consoles, lockdep has stayed > quiet. > ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 1/2] printk: Add console owner and waiter logic to load balance console writes 2018-01-10 13:24 ` [PATCH v5 1/2] printk: Add console owner and waiter logic to load balance console writes Petr Mladek 2018-01-10 16:50 ` Steven Rostedt @ 2018-01-12 16:54 ` Steven Rostedt 2018-01-12 17:11 ` Steven Rostedt 2018-01-18 22:03 ` Pavel Machek 2018-01-17 2:19 ` Byungchul Park 2 siblings, 2 replies; 140+ messages in thread From: Steven Rostedt @ 2018-01-12 16:54 UTC (permalink / raw) To: Petr Mladek Cc: Sergey Senozhatsky, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Sergey Senozhatsky, Tejun Heo, Pavel Machek, linux-kernel On Wed, 10 Jan 2018 14:24:17 +0100 Petr Mladek <pmladek@suse.com> wrote: > From: Steven Rostedt <rostedt@goodmis.org> > > From: Steven Rostedt (VMware) <rostedt@goodmis.org> > > This patch implements what I discussed in Kernel Summit. I added > lockdep annotation (hopefully correctly), and it hasn't had any splats > (since I fixed some bugs in the first iterations). It did catch > problems when I had the owner covering too much. But now that the owner > is only set when actively calling the consoles, lockdep has stayed > quiet. > > Here's the design again: > > I added a "console_owner" which is set to a task that is actively > writing to the consoles. It is *not* the same as the owner of the > console_lock. It is only set when doing the calls to the console > functions. It is protected by a console_owner_lock which is a raw spin > lock. > > There is a console_waiter. This is set when there is an active console > owner that is not current, and waiter is not set. This too is protected > by console_owner_lock. > > In printk() when it tries to write to the consoles, we have: > > if (console_trylock()) > console_unlock(); > > Now I added an else, which will check if there is an active owner, and > no current waiter. If that is the case, then console_waiter is set, and > the task goes into a spin until it is no longer set. > > When the active console owner finishes writing the current message to > the consoles, it grabs the console_owner_lock and sees if there is a > waiter, and clears console_owner. > > If there is a waiter, then it breaks out of the loop, clears the waiter > flag (because that will release the waiter from its spin), and exits. > Note, it does *not* release the console semaphore. Because it is a > semaphore, there is no owner. Another task may release it. This means > that the waiter is guaranteed to be the new console owner! Which it > becomes. > > Then the waiter calls console_unlock() and continues to write to the > consoles. > > If another task comes along and does a printk() it too can become the > new waiter, and we wash rinse and repeat! > > By Petr Mladek about possible new deadlocks: > > The thing is that we move console_sem only to printk() call > that normally calls console_unlock() as well. It means that > the transferred owner should not bring new type of dependencies. > As Steven said somewhere: "If there is a deadlock, it was > there even before." > > We could look at it from this side. The possible deadlock would > look like: > > CPU0 CPU1 > > console_unlock() > > console_owner = current; > > spin_lockA() > printk() > spin = true; > while (...) > > call_console_drivers() > spin_lockA() > > This would be a deadlock. CPU0 would wait for the lock A. > While CPU1 would own the lockA and would wait for CPU0 > to finish calling the console drivers and pass the console_sem > owner. > > But if the above is true than the following scenario was > already possible before: > > CPU0 > > spin_lockA() > printk() > console_unlock() > call_console_drivers() > spin_lockA() > > By other words, this deadlock was there even before. Such > deadlocks are prevented by using printk_deferred() in > the sections guarded by the lock A. Petr, Please add this here: ==== To demonstrate the issue, this module has been shown to lock up a system with 4 CPUs and a slow console (like a serial console). It is also able to lock up a 8 CPU system with only a fast (VGA) console, by passing in "loops=100". The changes in this commit prevent this module from locking up the system. #include <linux/module.h> #include <linux/delay.h> #include <linux/sched.h> #include <linux/mutex.h> #include <linux/workqueue.h> #include <linux/hrtimer.h> static bool stop_testing; static unsigned int loops = 1; static void preempt_printk_workfn(struct work_struct *work) { int i; while (!READ_ONCE(stop_testing)) { for (i = 0; i < loops && !READ_ONCE(stop_testing); i++) { preempt_disable(); pr_emerg("%5d%-75s\n", smp_processor_id(), " XXX NOPREEMPT"); preempt_enable(); } msleep(1); } } static struct work_struct __percpu *works; static void finish(void) { int cpu; WRITE_ONCE(stop_testing, true); for_each_online_cpu(cpu) flush_work(per_cpu_ptr(works, cpu)); free_percpu(works); } static int __init test_init(void) { int cpu; works = alloc_percpu(struct work_struct); if (!works) return -ENOMEM; /* * This is just a test module. This will break if you * do any CPU hot plugging between loading and * unloading the module. */ for_each_online_cpu(cpu) { struct work_struct *work = per_cpu_ptr(works, cpu); INIT_WORK(work, &preempt_printk_workfn); schedule_work_on(cpu, work); } return 0; } static void __exit test_exit(void) { finish(); } module_param(loops, uint, 0); module_init(test_init); module_exit(test_exit); MODULE_LICENSE("GPL"); ==== Hmm, how does one have git commit not remove the C preprocessor at the start of the module? -- Steve > > Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> > [pmladek@suse.com: Commit message about possible deadlocks] > --- ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 1/2] printk: Add console owner and waiter logic to load balance console writes 2018-01-12 16:54 ` Steven Rostedt @ 2018-01-12 17:11 ` Steven Rostedt 2018-01-17 19:13 ` Rasmus Villemoes 2018-01-18 22:03 ` Pavel Machek 1 sibling, 1 reply; 140+ messages in thread From: Steven Rostedt @ 2018-01-12 17:11 UTC (permalink / raw) To: Petr Mladek Cc: Sergey Senozhatsky, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Sergey Senozhatsky, Tejun Heo, Pavel Machek, linux-kernel On Fri, 12 Jan 2018 11:54:54 -0500 Steven Rostedt <rostedt@goodmis.org> wrote: > #include <linux/module.h> > #include <linux/delay.h> > #include <linux/sched.h> > #include <linux/mutex.h> > #include <linux/workqueue.h> > #include <linux/hrtimer.h> > > > > Hmm, how does one have git commit not remove the C preprocessor at the > start of the module? Probably just add a space in front of the entire program. -- Steve ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 1/2] printk: Add console owner and waiter logic to load balance console writes 2018-01-12 17:11 ` Steven Rostedt @ 2018-01-17 19:13 ` Rasmus Villemoes 2018-01-17 19:33 ` Steven Rostedt 2018-01-19 9:51 ` Sergey Senozhatsky 0 siblings, 2 replies; 140+ messages in thread From: Rasmus Villemoes @ 2018-01-17 19:13 UTC (permalink / raw) To: Steven Rostedt, Petr Mladek Cc: Sergey Senozhatsky, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Sergey Senozhatsky, Tejun Heo, Pavel Machek, linux-kernel On 2018-01-12 18:11, Steven Rostedt wrote: > On Fri, 12 Jan 2018 11:54:54 -0500 > Steven Rostedt <rostedt@goodmis.org> wrote: > >> #include <linux/module.h> >> #include <linux/delay.h> >> #include <linux/sched.h> >> #include <linux/mutex.h> >> #include <linux/workqueue.h> >> #include <linux/hrtimer.h> >> >> > > >> >> Hmm, how does one have git commit not remove the C preprocessor at the >> start of the module? > > Probably just add a space in front of the entire program. If you use at least git 2.0.0 [1], set commit.cleanup to "scissors". Something like git config commit.cleanup scissors should do the trick. Instead of stripping all lines starting with #, that will only strip stuff below a line containing # ------------------------ >8 ------------------------ and git should be smart enough to insert that in the editor it fires up for a commit message. [1] https://github.com/git/git/blob/master/Documentation/RelNotes/2.0.0.txt Rasmus ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 1/2] printk: Add console owner and waiter logic to load balance console writes 2018-01-17 19:13 ` Rasmus Villemoes @ 2018-01-17 19:33 ` Steven Rostedt 2018-01-19 9:51 ` Sergey Senozhatsky 1 sibling, 0 replies; 140+ messages in thread From: Steven Rostedt @ 2018-01-17 19:33 UTC (permalink / raw) To: Rasmus Villemoes Cc: Petr Mladek, Sergey Senozhatsky, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Sergey Senozhatsky, Tejun Heo, Pavel Machek, linux-kernel On Wed, 17 Jan 2018 20:13:28 +0100 Rasmus Villemoes <rasmus.villemoes@prevas.dk> wrote: > If you use at least git 2.0.0 [1], set commit.cleanup to "scissors". > Something like > > git config commit.cleanup scissors > > should do the trick. Instead of stripping all lines starting with #, > that will only strip stuff below a line containing > > # ------------------------ >8 ------------------------ > > and git should be smart enough to insert that in the editor it fires up > for a commit message. > > > [1] https://github.com/git/git/blob/master/Documentation/RelNotes/2.0.0.txt > > Thanks for the pointer. -- Steve ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 1/2] printk: Add console owner and waiter logic to load balance console writes 2018-01-17 19:13 ` Rasmus Villemoes 2018-01-17 19:33 ` Steven Rostedt @ 2018-01-19 9:51 ` Sergey Senozhatsky 1 sibling, 0 replies; 140+ messages in thread From: Sergey Senozhatsky @ 2018-01-19 9:51 UTC (permalink / raw) To: Rasmus Villemoes Cc: Steven Rostedt, Petr Mladek, Sergey Senozhatsky, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Sergey Senozhatsky, Tejun Heo, Pavel Machek, linux-kernel On (01/17/18 20:13), Rasmus Villemoes wrote: [..] > >> Hmm, how does one have git commit not remove the C preprocessor at the > >> start of the module? > > > > Probably just add a space in front of the entire program. > > If you use at least git 2.0.0 [1], set commit.cleanup to "scissors". > Something like > > git config commit.cleanup scissors > > should do the trick. Instead of stripping all lines starting with #, > that will only strip stuff below a line containing > > # ------------------------ >8 ------------------------ one thing that it changes is that now when you squash commits # This is the first patch first patch commit messages # This is the second patch second patch commit message # ------------------------ >8 ------------------------ those "# This is the first patch" and "# This is the second patch" won't be removed automatically. takes some time to get used to it. -ss ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 1/2] printk: Add console owner and waiter logic to load balance console writes 2018-01-12 16:54 ` Steven Rostedt 2018-01-12 17:11 ` Steven Rostedt @ 2018-01-18 22:03 ` Pavel Machek 2018-01-19 0:20 ` Steven Rostedt 1 sibling, 1 reply; 140+ messages in thread From: Pavel Machek @ 2018-01-18 22:03 UTC (permalink / raw) To: Steven Rostedt Cc: Petr Mladek, Sergey Senozhatsky, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Sergey Senozhatsky, Tejun Heo, linux-kernel [-- Attachment #1: Type: text/plain, Size: 1043 bytes --] Hi! > > By other words, this deadlock was there even before. Such > > deadlocks are prevented by using printk_deferred() in > > the sections guarded by the lock A. > > Petr, > > Please add this here: > > ==== > > To demonstrate the issue, this module has been shown to lock up a > system with 4 CPUs and a slow console (like a serial console). It is > also able to lock up a 8 CPU system with only a fast (VGA) console, by > passing in "loops=100". The changes in this commit prevent this module > from locking up the system. > > #include <linux/module.h> > #include <linux/delay.h> > #include <linux/sched.h> > #include <linux/mutex.h> > #include <linux/workqueue.h> > #include <linux/hrtimer.h> Programs in commit messages. Not preffered way to distribute code, I'd say. What about putting it into kernel selftests directory or something like that? Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 1/2] printk: Add console owner and waiter logic to load balance console writes 2018-01-18 22:03 ` Pavel Machek @ 2018-01-19 0:20 ` Steven Rostedt 0 siblings, 0 replies; 140+ messages in thread From: Steven Rostedt @ 2018-01-19 0:20 UTC (permalink / raw) To: Pavel Machek Cc: Petr Mladek, Sergey Senozhatsky, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Sergey Senozhatsky, Tejun Heo, linux-kernel On Thu, 18 Jan 2018 23:03:24 +0100 Pavel Machek <pavel@ucw.cz> wrote: > > To demonstrate the issue, this module has been shown to lock up a > > system with 4 CPUs and a slow console (like a serial console). It is > > also able to lock up a 8 CPU system with only a fast (VGA) console, by > > passing in "loops=100". The changes in this commit prevent this module > > from locking up the system. > > > > #include <linux/module.h> > > #include <linux/delay.h> > > #include <linux/sched.h> > > #include <linux/mutex.h> > > #include <linux/workqueue.h> > > #include <linux/hrtimer.h> > > Programs in commit messages. Not preffered way to distribute code, I'd > say. What about putting it into kernel selftests directory or > something like that? It's not really a program, but a module. I could add a real module that can test this, and people can modprobe it if they want to make sure there's no regressions. I can send a patch. -- Steve ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 1/2] printk: Add console owner and waiter logic to load balance console writes 2018-01-10 13:24 ` [PATCH v5 1/2] printk: Add console owner and waiter logic to load balance console writes Petr Mladek 2018-01-10 16:50 ` Steven Rostedt 2018-01-12 16:54 ` Steven Rostedt @ 2018-01-17 2:19 ` Byungchul Park 2018-01-17 4:54 ` Byungchul Park ` (2 more replies) 2 siblings, 3 replies; 140+ messages in thread From: Byungchul Park @ 2018-01-17 2:19 UTC (permalink / raw) To: Petr Mladek, Steven Rostedt, Sergey Senozhatsky Cc: akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Sergey Senozhatsky, Tejun Heo, Pavel Machek, linux-kernel, kernel-team On 1/10/2018 10:24 PM, Petr Mladek wrote: > From: Steven Rostedt <rostedt@goodmis.org> > > From: Steven Rostedt (VMware) <rostedt@goodmis.org> > > This patch implements what I discussed in Kernel Summit. I added > lockdep annotation (hopefully correctly), and it hasn't had any splats > (since I fixed some bugs in the first iterations). It did catch > problems when I had the owner covering too much. But now that the owner > is only set when actively calling the consoles, lockdep has stayed > quiet. > > Here's the design again: > > I added a "console_owner" which is set to a task that is actively > writing to the consoles. It is *not* the same as the owner of the > console_lock. It is only set when doing the calls to the console > functions. It is protected by a console_owner_lock which is a raw spin > lock. > > There is a console_waiter. This is set when there is an active console > owner that is not current, and waiter is not set. This too is protected > by console_owner_lock. > > In printk() when it tries to write to the consoles, we have: > > if (console_trylock()) > console_unlock(); > > Now I added an else, which will check if there is an active owner, and > no current waiter. If that is the case, then console_waiter is set, and > the task goes into a spin until it is no longer set. > > When the active console owner finishes writing the current message to > the consoles, it grabs the console_owner_lock and sees if there is a > waiter, and clears console_owner. > > If there is a waiter, then it breaks out of the loop, clears the waiter > flag (because that will release the waiter from its spin), and exits. > Note, it does *not* release the console semaphore. Because it is a > semaphore, there is no owner. Another task may release it. This means > that the waiter is guaranteed to be the new console owner! Which it > becomes. > > Then the waiter calls console_unlock() and continues to write to the > consoles. > > If another task comes along and does a printk() it too can become the > new waiter, and we wash rinse and repeat! > > By Petr Mladek about possible new deadlocks: > > The thing is that we move console_sem only to printk() call > that normally calls console_unlock() as well. It means that > the transferred owner should not bring new type of dependencies. > As Steven said somewhere: "If there is a deadlock, it was > there even before." > > We could look at it from this side. The possible deadlock would > look like: > > CPU0 CPU1 > > console_unlock() > > console_owner = current; > > spin_lockA() > printk() > spin = true; > while (...) > > call_console_drivers() > spin_lockA() > > This would be a deadlock. CPU0 would wait for the lock A. > While CPU1 would own the lockA and would wait for CPU0 > to finish calling the console drivers and pass the console_sem > owner. > > But if the above is true than the following scenario was > already possible before: > > CPU0 > > spin_lockA() > printk() > console_unlock() > call_console_drivers() > spin_lockA() > > By other words, this deadlock was there even before. Such > deadlocks are prevented by using printk_deferred() in > the sections guarded by the lock A. Hello, I didn't see what you did, at the last version. You were tring to transfer the semaphore owner and make it taken over. I see. But, what I mentioned last time is still valid. See below. > Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> > [pmladek@suse.com: Commit message about possible deadlocks] > --- > kernel/printk/printk.c | 108 ++++++++++++++++++++++++++++++++++++++++++++++++- > 1 file changed, 107 insertions(+), 1 deletion(-) > > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c > index b9006617710f..7e6459abba43 100644 > --- a/kernel/printk/printk.c > +++ b/kernel/printk/printk.c > @@ -86,8 +86,15 @@ EXPORT_SYMBOL_GPL(console_drivers); > static struct lockdep_map console_lock_dep_map = { > .name = "console_lock" > }; > +static struct lockdep_map console_owner_dep_map = { > + .name = "console_owner" > +}; > #endif > > +static DEFINE_RAW_SPINLOCK(console_owner_lock); > +static struct task_struct *console_owner; > +static bool console_waiter; > + > enum devkmsg_log_bits { > __DEVKMSG_LOG_BIT_ON = 0, > __DEVKMSG_LOG_BIT_OFF, > @@ -1753,8 +1760,56 @@ asmlinkage int vprintk_emit(int facility, int level, > * semaphore. The release will print out buffers and wake up > * /dev/kmsg and syslog() users. > */ > - if (console_trylock()) > + if (console_trylock()) { > console_unlock(); > + } else { > + struct task_struct *owner = NULL; > + bool waiter; > + bool spin = false; > + > + printk_safe_enter_irqsave(flags); > + > + raw_spin_lock(&console_owner_lock); > + owner = READ_ONCE(console_owner); > + waiter = READ_ONCE(console_waiter); > + if (!waiter && owner && owner != current) { > + WRITE_ONCE(console_waiter, true); > + spin = true; > + } > + raw_spin_unlock(&console_owner_lock); > + > + /* > + * If there is an active printk() writing to the > + * consoles, instead of having it write our data too, > + * see if we can offload that load from the active > + * printer, and do some printing ourselves. > + * Go into a spin only if there isn't already a waiter > + * spinning, and there is an active printer, and > + * that active printer isn't us (recursive printk?). > + */ > + if (spin) { > + /* We spin waiting for the owner to release us */ > + spin_acquire(&console_owner_dep_map, 0, 0, _THIS_IP_); > + /* Owner will clear console_waiter on hand off */ > + while (READ_ONCE(console_waiter)) > + cpu_relax(); > + > + spin_release(&console_owner_dep_map, 1, _THIS_IP_); Why don't you move this over "while (READ_ONCE(console_waiter))" and right after acquire()? As I said last time, only acquisitions between acquire() and release() are meaningful. Are you taking care of acquisitions within cpu_relax()? If so, leave it. > + printk_safe_exit_irqrestore(flags); > + > + /* > + * The owner passed the console lock to us. > + * Since we did not spin on console lock, annotate > + * this as a trylock. Otherwise lockdep will > + * complain. > + */ > + mutex_acquire(&console_lock_dep_map, 0, 1, _THIS_IP_); > + console_unlock(); > + printk_safe_enter_irqsave(flags); > + } > + printk_safe_exit_irqrestore(flags); > + > + } > } > > return printed_len; > @@ -2141,6 +2196,7 @@ void console_unlock(void) > static u64 seen_seq; > unsigned long flags; > bool wake_klogd = false; > + bool waiter = false; > bool do_cond_resched, retry; > > if (console_suspended) { > @@ -2229,14 +2285,64 @@ void console_unlock(void) > console_seq++; > raw_spin_unlock(&logbuf_lock); > > + /* > + * While actively printing out messages, if another printk() > + * were to occur on another CPU, it may wait for this one to > + * finish. This task can not be preempted if there is a > + * waiter waiting to take over. > + */ > + raw_spin_lock(&console_owner_lock); > + console_owner = current; > + raw_spin_unlock(&console_owner_lock); > + > + /* The waiter may spin on us after setting console_owner */ > + spin_acquire(&console_owner_dep_map, 0, 0, _THIS_IP_); > + > stop_critical_timings(); /* don't trace print latency */ > call_console_drivers(ext_text, ext_len, text, len); > start_critical_timings(); > + > + raw_spin_lock(&console_owner_lock); > + waiter = READ_ONCE(console_waiter); > + console_owner = NULL; > + raw_spin_unlock(&console_owner_lock); > + > + /* > + * If there is a waiter waiting for us, then pass the > + * rest of the work load over to that waiter. > + */ > + if (waiter) > + break; > + > + /* There was no waiter, and nothing will spin on us here */ > + spin_release(&console_owner_dep_map, 1, _THIS_IP_); Why don't you move this over "if (waiter)"? > + > printk_safe_exit_irqrestore(flags); > > if (do_cond_resched) > cond_resched(); > } > + > + /* > + * If there is an active waiter waiting on the console_lock. > + * Pass off the printing to the waiter, and the waiter > + * will continue printing on its CPU, and when all writing > + * has finished, the last printer will wake up klogd. > + */ > + if (waiter) { > + WRITE_ONCE(console_waiter, false); > + /* The waiter is now free to continue */ > + spin_release(&console_owner_dep_map, 1, _THIS_IP_); Why don't you remove this release() after relocating the upper one? > + /* > + * Hand off console_lock to waiter. The waiter will perform > + * the up(). After this, the waiter is the console_lock owner. > + */ > + mutex_release(&console_lock_dep_map, 1, _THIS_IP_); > + printk_safe_exit_irqrestore(flags); > + /* Note, if waiter is set, logbuf_lock is not held */ > + return; > + } > + > console_locked = 0; > > /* Release the exclusive_console once it is used */ > -- Thanks, Byungchul ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 1/2] printk: Add console owner and waiter logic to load balance console writes 2018-01-17 2:19 ` Byungchul Park @ 2018-01-17 4:54 ` Byungchul Park 2018-01-17 7:34 ` Byungchul Park 2018-01-17 12:04 ` Petr Mladek 2 siblings, 0 replies; 140+ messages in thread From: Byungchul Park @ 2018-01-17 4:54 UTC (permalink / raw) To: Petr Mladek, Steven Rostedt, Sergey Senozhatsky Cc: akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Sergey Senozhatsky, Tejun Heo, Pavel Machek, linux-kernel, kernel-team On 1/17/2018 11:19 AM, Byungchul Park wrote: > On 1/10/2018 10:24 PM, Petr Mladek wrote: >> From: Steven Rostedt <rostedt@goodmis.org> >> >> From: Steven Rostedt (VMware) <rostedt@goodmis.org> >> >> This patch implements what I discussed in Kernel Summit. I added >> lockdep annotation (hopefully correctly), and it hasn't had any splats >> (since I fixed some bugs in the first iterations). It did catch >> problems when I had the owner covering too much. But now that the owner >> is only set when actively calling the consoles, lockdep has stayed >> quiet. >> >> Here's the design again: >> >> I added a "console_owner" which is set to a task that is actively >> writing to the consoles. It is *not* the same as the owner of the >> console_lock. It is only set when doing the calls to the console >> functions. It is protected by a console_owner_lock which is a raw spin >> lock. >> >> There is a console_waiter. This is set when there is an active console >> owner that is not current, and waiter is not set. This too is protected >> by console_owner_lock. >> >> In printk() when it tries to write to the consoles, we have: >> >> if (console_trylock()) >> console_unlock(); >> >> Now I added an else, which will check if there is an active owner, and >> no current waiter. If that is the case, then console_waiter is set, and >> the task goes into a spin until it is no longer set. >> >> When the active console owner finishes writing the current message to >> the consoles, it grabs the console_owner_lock and sees if there is a >> waiter, and clears console_owner. >> >> If there is a waiter, then it breaks out of the loop, clears the waiter >> flag (because that will release the waiter from its spin), and exits. >> Note, it does *not* release the console semaphore. Because it is a >> semaphore, there is no owner. Another task may release it. This means >> that the waiter is guaranteed to be the new console owner! Which it >> becomes. >> >> Then the waiter calls console_unlock() and continues to write to the >> consoles. >> >> If another task comes along and does a printk() it too can become the >> new waiter, and we wash rinse and repeat! >> >> By Petr Mladek about possible new deadlocks: >> >> The thing is that we move console_sem only to printk() call >> that normally calls console_unlock() as well. It means that >> the transferred owner should not bring new type of dependencies. >> As Steven said somewhere: "If there is a deadlock, it was >> there even before." >> >> We could look at it from this side. The possible deadlock would >> look like: >> >> CPU0 CPU1 >> >> console_unlock() >> >> console_owner = current; >> >> spin_lockA() >> printk() >> spin = true; >> while (...) >> >> call_console_drivers() >> spin_lockA() >> >> This would be a deadlock. CPU0 would wait for the lock A. >> While CPU1 would own the lockA and would wait for CPU0 >> to finish calling the console drivers and pass the console_sem >> owner. >> >> But if the above is true than the following scenario was >> already possible before: >> >> CPU0 >> >> spin_lockA() >> printk() >> console_unlock() >> call_console_drivers() >> spin_lockA() >> >> By other words, this deadlock was there even before. Such >> deadlocks are prevented by using printk_deferred() in >> the sections guarded by the lock A. > > Hello, > > I didn't see what you did, at the last version. You were > tring to transfer the semaphore owner and make it taken > over. I see. > > But, what I mentioned last time is still valid. See below. Of course, it's not an important thing but trivial one though. -- Thanks, Byungchul ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 1/2] printk: Add console owner and waiter logic to load balance console writes 2018-01-17 2:19 ` Byungchul Park 2018-01-17 4:54 ` Byungchul Park @ 2018-01-17 7:34 ` Byungchul Park 2018-01-17 12:04 ` Petr Mladek 2 siblings, 0 replies; 140+ messages in thread From: Byungchul Park @ 2018-01-17 7:34 UTC (permalink / raw) To: Petr Mladek, Steven Rostedt, Sergey Senozhatsky Cc: akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Sergey Senozhatsky, Tejun Heo, Pavel Machek, linux-kernel, kernel-team On 1/17/2018 11:19 AM, Byungchul Park wrote: > On 1/10/2018 10:24 PM, Petr Mladek wrote: >> From: Steven Rostedt <rostedt@goodmis.org> >> >> From: Steven Rostedt (VMware) <rostedt@goodmis.org> >> >> This patch implements what I discussed in Kernel Summit. I added >> lockdep annotation (hopefully correctly), and it hasn't had any splats >> (since I fixed some bugs in the first iterations). It did catch >> problems when I had the owner covering too much. But now that the owner >> is only set when actively calling the consoles, lockdep has stayed >> quiet. >> >> Here's the design again: >> >> I added a "console_owner" which is set to a task that is actively >> writing to the consoles. It is *not* the same as the owner of the >> console_lock. It is only set when doing the calls to the console >> functions. It is protected by a console_owner_lock which is a raw spin >> lock. >> >> There is a console_waiter. This is set when there is an active console >> owner that is not current, and waiter is not set. This too is protected >> by console_owner_lock. >> >> In printk() when it tries to write to the consoles, we have: >> >> if (console_trylock()) >> console_unlock(); >> >> Now I added an else, which will check if there is an active owner, and >> no current waiter. If that is the case, then console_waiter is set, and >> the task goes into a spin until it is no longer set. >> >> When the active console owner finishes writing the current message to >> the consoles, it grabs the console_owner_lock and sees if there is a >> waiter, and clears console_owner. >> >> If there is a waiter, then it breaks out of the loop, clears the waiter >> flag (because that will release the waiter from its spin), and exits. >> Note, it does *not* release the console semaphore. Because it is a >> semaphore, there is no owner. Another task may release it. This means >> that the waiter is guaranteed to be the new console owner! Which it >> becomes. >> >> Then the waiter calls console_unlock() and continues to write to the >> consoles. >> >> If another task comes along and does a printk() it too can become the >> new waiter, and we wash rinse and repeat! >> >> By Petr Mladek about possible new deadlocks: >> >> The thing is that we move console_sem only to printk() call >> that normally calls console_unlock() as well. It means that >> the transferred owner should not bring new type of dependencies. >> As Steven said somewhere: "If there is a deadlock, it was >> there even before." >> >> We could look at it from this side. The possible deadlock would >> look like: >> >> CPU0 CPU1 >> >> console_unlock() >> >> console_owner = current; >> >> spin_lockA() >> printk() >> spin = true; >> while (...) >> >> call_console_drivers() >> spin_lockA() >> >> This would be a deadlock. CPU0 would wait for the lock A. >> While CPU1 would own the lockA and would wait for CPU0 >> to finish calling the console drivers and pass the console_sem >> owner. >> >> But if the above is true than the following scenario was >> already possible before: >> >> CPU0 >> >> spin_lockA() >> printk() >> console_unlock() >> call_console_drivers() >> spin_lockA() >> >> By other words, this deadlock was there even before. Such >> deadlocks are prevented by using printk_deferred() in >> the sections guarded by the lock A. > > Hello, > > I didn't see what you did, at the last version. You were > tring to transfer the semaphore owner and make it taken > over. I see. > > But, what I mentioned last time is still valid. See below. > >> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> >> [pmladek@suse.com: Commit message about possible deadlocks] >> --- >> kernel/printk/printk.c | 108 >> ++++++++++++++++++++++++++++++++++++++++++++++++- >> 1 file changed, 107 insertions(+), 1 deletion(-) >> >> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c >> index b9006617710f..7e6459abba43 100644 >> --- a/kernel/printk/printk.c >> +++ b/kernel/printk/printk.c >> @@ -86,8 +86,15 @@ EXPORT_SYMBOL_GPL(console_drivers); >> static struct lockdep_map console_lock_dep_map = { >> .name = "console_lock" >> }; >> +static struct lockdep_map console_owner_dep_map = { >> + .name = "console_owner" >> +}; >> #endif >> +static DEFINE_RAW_SPINLOCK(console_owner_lock); >> +static struct task_struct *console_owner; >> +static bool console_waiter; >> + >> enum devkmsg_log_bits { >> __DEVKMSG_LOG_BIT_ON = 0, >> __DEVKMSG_LOG_BIT_OFF, >> @@ -1753,8 +1760,56 @@ asmlinkage int vprintk_emit(int facility, int >> level, >> * semaphore. The release will print out buffers and wake up >> * /dev/kmsg and syslog() users. >> */ >> - if (console_trylock()) >> + if (console_trylock()) { >> console_unlock(); >> + } else { >> + struct task_struct *owner = NULL; >> + bool waiter; >> + bool spin = false; >> + >> + printk_safe_enter_irqsave(flags); >> + >> + raw_spin_lock(&console_owner_lock); >> + owner = READ_ONCE(console_owner); >> + waiter = READ_ONCE(console_waiter); >> + if (!waiter && owner && owner != current) { >> + WRITE_ONCE(console_waiter, true); >> + spin = true; >> + } >> + raw_spin_unlock(&console_owner_lock); >> + >> + /* >> + * If there is an active printk() writing to the >> + * consoles, instead of having it write our data too, >> + * see if we can offload that load from the active >> + * printer, and do some printing ourselves. >> + * Go into a spin only if there isn't already a waiter >> + * spinning, and there is an active printer, and >> + * that active printer isn't us (recursive printk?). >> + */ >> + if (spin) { >> + /* We spin waiting for the owner to release us */ >> + spin_acquire(&console_owner_dep_map, 0, 0, _THIS_IP_); >> + /* Owner will clear console_waiter on hand off */ >> + while (READ_ONCE(console_waiter)) >> + cpu_relax(); >> + >> + spin_release(&console_owner_dep_map, 1, _THIS_IP_); > > Why don't you move this over "while (READ_ONCE(console_waiter))" and > right after acquire()? > > As I said last time, only acquisitions between acquire() and release() > are meaningful. Are you taking care of acquisitions within cpu_relax()? > If so, leave it. In addition, this way would be correct if you intended to use cross-lock's map here, assuming cross-release alive.. But anyway this is just a typical acquire/release pair so we don't usually use the pair in this way. -- Thanks, Byungchul ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 1/2] printk: Add console owner and waiter logic to load balance console writes 2018-01-17 2:19 ` Byungchul Park 2018-01-17 4:54 ` Byungchul Park 2018-01-17 7:34 ` Byungchul Park @ 2018-01-17 12:04 ` Petr Mladek 2018-01-18 1:53 ` Byungchul Park 2 siblings, 1 reply; 140+ messages in thread From: Petr Mladek @ 2018-01-17 12:04 UTC (permalink / raw) To: Byungchul Park Cc: Steven Rostedt, Sergey Senozhatsky, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Sergey Senozhatsky, Tejun Heo, Pavel Machek, linux-kernel, kernel-team On Wed 2018-01-17 11:19:53, Byungchul Park wrote: > On 1/10/2018 10:24 PM, Petr Mladek wrote: > > From: Steven Rostedt <rostedt@goodmis.org> > > By Petr Mladek about possible new deadlocks: > > > > The thing is that we move console_sem only to printk() call > > that normally calls console_unlock() as well. It means that > > the transferred owner should not bring new type of dependencies. > > As Steven said somewhere: "If there is a deadlock, it was > > there even before." > > > > We could look at it from this side. The possible deadlock would > > look like: > > > > CPU0 CPU1 > > > > console_unlock() > > > > console_owner = current; > > > > spin_lockA() > > printk() > > spin = true; > > while (...) > > > > call_console_drivers() > > spin_lockA() > > > > This would be a deadlock. CPU0 would wait for the lock A. > > While CPU1 would own the lockA and would wait for CPU0 > > to finish calling the console drivers and pass the console_sem > > owner. > > > > But if the above is true than the following scenario was > > already possible before: > > > > CPU0 > > > > spin_lockA() > > printk() > > console_unlock() > > call_console_drivers() > > spin_lockA() > > > > By other words, this deadlock was there even before. Such > > deadlocks are prevented by using printk_deferred() in > > the sections guarded by the lock A. > > Hello, > > I didn't see what you did, at the last version. You were > tring to transfer the semaphore owner and make it taken > over. I see. I realized that I did not understand lockdep and especially the cross-release stuff enough to be sure about the annotations. In addition, the cross-release feature was removed, ... Instead, I made a proof by contradiction. A very simplified summary is mentioned in the commit message above. I believe that the new dependency actually does not bring any new risk of a deadlock. Anyway, the last version of the code can be found in printk.git, for-4.16-console-waiter-logic branch, see https://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk.git/log/?h=for-4.16-console-waiter-logic It is also merged into linux-next. > > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c > > index b9006617710f..7e6459abba43 100644 > > --- a/kernel/printk/printk.c > > +++ b/kernel/printk/printk.c > > @@ -1753,8 +1760,56 @@ asmlinkage int vprintk_emit(int facility, int level, > > * semaphore. The release will print out buffers and wake up > > * /dev/kmsg and syslog() users. > > */ > > - if (console_trylock()) > > + if (console_trylock()) { > > console_unlock(); > > + } else { > > + struct task_struct *owner = NULL; > > + bool waiter; > > + bool spin = false; > > + > > + printk_safe_enter_irqsave(flags); > > + > > + raw_spin_lock(&console_owner_lock); > > + owner = READ_ONCE(console_owner); > > + waiter = READ_ONCE(console_waiter); > > + if (!waiter && owner && owner != current) { > > + WRITE_ONCE(console_waiter, true); > > + spin = true; > > + } > > + raw_spin_unlock(&console_owner_lock); > > + > > + /* > > + * If there is an active printk() writing to the > > + * consoles, instead of having it write our data too, > > + * see if we can offload that load from the active > > + * printer, and do some printing ourselves. > > + * Go into a spin only if there isn't already a waiter > > + * spinning, and there is an active printer, and > > + * that active printer isn't us (recursive printk?). > > + */ > > + if (spin) { > > + /* We spin waiting for the owner to release us */ > > + spin_acquire(&console_owner_dep_map, 0, 0, _THIS_IP_); > > + /* Owner will clear console_waiter on hand off */ > > + while (READ_ONCE(console_waiter)) > > + cpu_relax(); > > + > > + spin_release(&console_owner_dep_map, 1, _THIS_IP_); > > Why don't you move this over "while (READ_ONCE(console_waiter))" and > right after acquire()? > > As I said last time, only acquisitions between acquire() and release() > are meaningful. Are you taking care of acquisitions within cpu_relax()? > If so, leave it. We are simulating a spinlock here. The above code corresponds to spin_lock(&console_owner_spin_lock); spin_unlock(&console_owner_spin_lock); I mean that spin_acquire() + while-cycle corresponds to spin_lock(). And spin_release() corresponds to spin_unlock(). > > + printk_safe_exit_irqrestore(flags); > > + > > + /* > > + * The owner passed the console lock to us. > > + * Since we did not spin on console lock, annotate > > + * this as a trylock. Otherwise lockdep will > > + * complain. > > + */ > > + mutex_acquire(&console_lock_dep_map, 0, 1, _THIS_IP_); > > + console_unlock(); > > + printk_safe_enter_irqsave(flags); > > + } > > + printk_safe_exit_irqrestore(flags); > > + > > + } > > } > > return printed_len; > > @@ -2141,6 +2196,7 @@ void console_unlock(void) > > static u64 seen_seq; > > unsigned long flags; > > bool wake_klogd = false; > > + bool waiter = false; > > bool do_cond_resched, retry; > > if (console_suspended) { > > @@ -2229,14 +2285,64 @@ void console_unlock(void) > > console_seq++; > > raw_spin_unlock(&logbuf_lock); > > + /* > > + * While actively printing out messages, if another printk() > > + * were to occur on another CPU, it may wait for this one to > > + * finish. This task can not be preempted if there is a > > + * waiter waiting to take over. > > + */ > > + raw_spin_lock(&console_owner_lock); > > + console_owner = current; > > + raw_spin_unlock(&console_owner_lock); > > + > > + /* The waiter may spin on us after setting console_owner */ > > + spin_acquire(&console_owner_dep_map, 0, 0, _THIS_IP_); > > + > > stop_critical_timings(); /* don't trace print latency */ > > call_console_drivers(ext_text, ext_len, text, len); > > start_critical_timings(); > > + > > + raw_spin_lock(&console_owner_lock); > > + waiter = READ_ONCE(console_waiter); > > + console_owner = NULL; > > + raw_spin_unlock(&console_owner_lock); > > + > > + /* > > + * If there is a waiter waiting for us, then pass the > > + * rest of the work load over to that waiter. > > + */ > > + if (waiter) > > + break; > > + > > + /* There was no waiter, and nothing will spin on us here */ > > + spin_release(&console_owner_dep_map, 1, _THIS_IP_); > > Why don't you move this over "if (waiter)"? We want to actually release the lock before calling spin_release, see below. > > + > > printk_safe_exit_irqrestore(flags); > > if (do_cond_resched) > > cond_resched(); > > } > > + > > + /* > > + * If there is an active waiter waiting on the console_lock. > > + * Pass off the printing to the waiter, and the waiter > > + * will continue printing on its CPU, and when all writing > > + * has finished, the last printer will wake up klogd. > > + */ > > + if (waiter) { > > + WRITE_ONCE(console_waiter, false); > > + /* The waiter is now free to continue */ > > + spin_release(&console_owner_dep_map, 1, _THIS_IP_); > > Why don't you remove this release() after relocating the upper one? The manipulation of "console_waiter" implements the spin_lock that we are trying to simulate. It is such easy because it is guaranteed that there is always only one process that tries to get this fake spin_lock. Also the other waiter releases the spin lock immediately after it gets it. I mean that WRITE_ONCE(console_waiter, false) causes that the simulated spin lock is released here. Also the while-cycle in vprintk_emit() succeeds. The while-cycle success means that vprintk_emit() actually acquires the simulated spinlock. This synchronization is need to make sure that the two processes pass the console_lock ownership at the right place. I think that at least this simulated spin lock is annotated the right way by console_owner_dep_map manipulations. And I think that we do not need the cross-release feature to simulate this spin lock. > > + /* > > + * Hand off console_lock to waiter. The waiter will perform > > + * the up(). After this, the waiter is the console_lock owner. > > + */ > > + mutex_release(&console_lock_dep_map, 1, _THIS_IP_); The cross-release feature might be needed here. The above annotation says that the semaphore is release here. In reality, it is released in the process that calls vprintk_emit(). We actually just passed the ownership here. Does this make any sense? Could we do better using the existing lockdep annotations? If you have a better solution, it might make sense to send a patch on top of linux-next. There is a commit that moved these code into three helper functions: console_lock_spinning_enable() console_lock_spinning_disable_and_check() console_trylock_spinning() See https://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk.git/commit/?h=for-4.16-console-waiter-logic&id=c162d5b4338d72deed61aa65ed0f2f4ba2bbc8ab Best Regards, Petr > > + printk_safe_exit_irqrestore(flags); > > + /* Note, if waiter is set, logbuf_lock is not held */ > > + return; > > + } > > + > > console_locked = 0; > > /* Release the exclusive_console once it is used */ > > > > -- > Thanks, > Byungchul ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 1/2] printk: Add console owner and waiter logic to load balance console writes 2018-01-17 12:04 ` Petr Mladek @ 2018-01-18 1:53 ` Byungchul Park 2018-01-18 1:57 ` Byungchul Park 2018-01-18 2:19 ` Steven Rostedt 0 siblings, 2 replies; 140+ messages in thread From: Byungchul Park @ 2018-01-18 1:53 UTC (permalink / raw) To: Petr Mladek Cc: Steven Rostedt, Sergey Senozhatsky, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Sergey Senozhatsky, Tejun Heo, Pavel Machek, linux-kernel, kernel-team On 1/17/2018 9:04 PM, Petr Mladek wrote: > On Wed 2018-01-17 11:19:53, Byungchul Park wrote: >> On 1/10/2018 10:24 PM, Petr Mladek wrote: >>> From: Steven Rostedt <rostedt@goodmis.org> [...] >>> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c >>> index b9006617710f..7e6459abba43 100644 >>> --- a/kernel/printk/printk.c >>> +++ b/kernel/printk/printk.c >>> @@ -1753,8 +1760,56 @@ asmlinkage int vprintk_emit(int facility, int level, >>> * semaphore. The release will print out buffers and wake up >>> * /dev/kmsg and syslog() users. >>> */ >>> - if (console_trylock()) >>> + if (console_trylock()) { >>> console_unlock(); >>> + } else { >>> + struct task_struct *owner = NULL; >>> + bool waiter; >>> + bool spin = false; >>> + >>> + printk_safe_enter_irqsave(flags); >>> + >>> + raw_spin_lock(&console_owner_lock); >>> + owner = READ_ONCE(console_owner); >>> + waiter = READ_ONCE(console_waiter); >>> + if (!waiter && owner && owner != current) { >>> + WRITE_ONCE(console_waiter, true); >>> + spin = true; >>> + } >>> + raw_spin_unlock(&console_owner_lock); >>> + >>> + /* >>> + * If there is an active printk() writing to the >>> + * consoles, instead of having it write our data too, >>> + * see if we can offload that load from the active >>> + * printer, and do some printing ourselves. >>> + * Go into a spin only if there isn't already a waiter >>> + * spinning, and there is an active printer, and >>> + * that active printer isn't us (recursive printk?). >>> + */ >>> + if (spin) { >>> + /* We spin waiting for the owner to release us */ >>> + spin_acquire(&console_owner_dep_map, 0, 0, _THIS_IP_); >>> + /* Owner will clear console_waiter on hand off */ >>> + while (READ_ONCE(console_waiter)) >>> + cpu_relax(); >>> + >>> + spin_release(&console_owner_dep_map, 1, _THIS_IP_); >> >> Why don't you move this over "while (READ_ONCE(console_waiter))" and >> right after acquire()? >> >> As I said last time, only acquisitions between acquire() and release() >> are meaningful. Are you taking care of acquisitions within cpu_relax()? >> If so, leave it. > > We are simulating a spinlock here. The above code corresponds to > > spin_lock(&console_owner_spin_lock); > spin_unlock(&console_owner_spin_lock); > > I mean that spin_acquire() + while-cycle corresponds > to spin_lock(). And spin_release() corresponds to > spin_unlock(). Hello, This is a thing simulating a wait for an event e.g. wait_for_completion() doing spinning instead of sleep, rather than a spinlock. I mean: This context ------------ while (READ_ONCE(console_waiter)) /* Wait for the event */ cpu_relax(); Another context --------------- WRITE_ONCE(console_waiter, false); /* Event */ That's why I said this's the exact case of cross-release. Anyway without cross-release, we usually use typical acquire/release pairs to cover a wait for an event in the following way: A context --------- lock_map_acquire(wait); /* Or lock_map_acquire_read(wait) */ /* Read one is better though.. */ /* A section, we suspect, a wait for an event might happen. */ ... lock_map_release(wait); The place actually doing the wait --------------------------------- lock_map_acquire(wait); lock_map_acquire(wait); wait_for_event(wait); /* Actually do the wait */ You can see a simple example of how to use them by searching kernel/cpu.c with "lock_acquire" and "wait_for_completion". However, as I said, if you suspect that cpu_relax() includes the wait, then it's ok to leave it. Otherwise, I think it would be better to change it in the way I showed you above. >>> + printk_safe_exit_irqrestore(flags); >>> + >>> + /* >>> + * The owner passed the console lock to us. >>> + * Since we did not spin on console lock, annotate >>> + * this as a trylock. Otherwise lockdep will >>> + * complain. >>> + */ >>> + mutex_acquire(&console_lock_dep_map, 0, 1, _THIS_IP_); >>> + console_unlock(); >>> + printk_safe_enter_irqsave(flags); >>> + } >>> + printk_safe_exit_irqrestore(flags); >>> + >>> + } >>> } >>> return printed_len; >>> @@ -2141,6 +2196,7 @@ void console_unlock(void) >>> static u64 seen_seq; >>> unsigned long flags; >>> bool wake_klogd = false; >>> + bool waiter = false; >>> bool do_cond_resched, retry; >>> if (console_suspended) { >>> @@ -2229,14 +2285,64 @@ void console_unlock(void) >>> console_seq++; >>> raw_spin_unlock(&logbuf_lock); >>> + /* >>> + * While actively printing out messages, if another printk() >>> + * were to occur on another CPU, it may wait for this one to >>> + * finish. This task can not be preempted if there is a >>> + * waiter waiting to take over. >>> + */ >>> + raw_spin_lock(&console_owner_lock); >>> + console_owner = current; >>> + raw_spin_unlock(&console_owner_lock); >>> + >>> + /* The waiter may spin on us after setting console_owner */ >>> + spin_acquire(&console_owner_dep_map, 0, 0, _THIS_IP_); >>> + >>> stop_critical_timings(); /* don't trace print latency */ >>> call_console_drivers(ext_text, ext_len, text, len); >>> start_critical_timings(); >>> + >>> + raw_spin_lock(&console_owner_lock); >>> + waiter = READ_ONCE(console_waiter); >>> + console_owner = NULL; >>> + raw_spin_unlock(&console_owner_lock); >>> + >>> + /* >>> + * If there is a waiter waiting for us, then pass the >>> + * rest of the work load over to that waiter. >>> + */ >>> + if (waiter) >>> + break; >>> + >>> + /* There was no waiter, and nothing will spin on us here */ >>> + spin_release(&console_owner_dep_map, 1, _THIS_IP_); >> >> Why don't you move this over "if (waiter)"? > > We want to actually release the lock before calling spin_release, > see below. Excuse me but, I don't see.. >>> + >>> printk_safe_exit_irqrestore(flags); >>> if (do_cond_resched) >>> cond_resched(); >>> } >>> + >>> + /* >>> + * If there is an active waiter waiting on the console_lock. >>> + * Pass off the printing to the waiter, and the waiter >>> + * will continue printing on its CPU, and when all writing >>> + * has finished, the last printer will wake up klogd. >>> + */ >>> + if (waiter) { >>> + WRITE_ONCE(console_waiter, false); >>> + /* The waiter is now free to continue */ >>> + spin_release(&console_owner_dep_map, 1, _THIS_IP_); >> >> Why don't you remove this release() after relocating the upper one? You should use this acquire/release pair here to detect if the following section involves the spinning again for console_waiter: stop_critical_timings(); call_console_drivers(ext_text, ext_len, text, len); start_critical_timings(); raw_spin_lock(&console_owner_lock); waiter = READ_ONCE(console_waiter); console_owner = NULL; raw_spin_unlock(&console_owner_lock); There should be no more meaning than that. > The manipulation of "console_waiter" implements the spin_lock that > we are trying to simulate. It is such easy because it is guaranteed > that there is always only one process that tries to get this > fake spin_lock. Also the other waiter releases the spin lock > immediately after it gets it. > > I mean that WRITE_ONCE(console_waiter, false) causes that > the simulated spin lock is released here. Also the while-cycle > in vprintk_emit() succeeds. The while-cycle success means > that vprintk_emit() actually acquires the simulated spinlock. I understand what you want to explain. If cross-release was alive, there might be several things to talk more but now, what I explained above is all we can do with existing acquire/release. > This synchronization is need to make sure that the two processes > pass the console_lock ownership at the right place. > > I think that at least this simulated spin lock is annotated the right > way by console_owner_dep_map manipulations. And I think that we I also think it would work logically. I just wanted to say the code looks like as if it's doing something cross-release stuff, despite not, and suggest a common way to use typical ones. That's all. :) I would send a patch if you also think so, but it's ok even if not. > do not need the cross-release feature to simulate this spin lock. > > >>> + /* >>> + * Hand off console_lock to waiter. The waiter will perform >>> + * the up(). After this, the waiter is the console_lock owner. >>> + */ >>> + mutex_release(&console_lock_dep_map, 1, _THIS_IP_); > > The cross-release feature might be needed here. The above annotation > says that the semaphore is release here. In reality, it is released Yeah, cross-release might be needed here, but it won't be such simple anyway. > in the process that calls vprintk_emit(). We actually just passed the > ownership here. > > Does this make any sense? Could we do better using the existing > lockdep annotations? I wonder what you think about thinks I told you. Could you let me know? > If you have a better solution, it might make sense to send a patch > on top of linux-next. There is a commit that moved these code > into three helper functions: I would after getting your feedback. Thanks a lot. > console_lock_spinning_enable() > console_lock_spinning_disable_and_check() > console_trylock_spinning() > > See > https://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk.git/commit/?h=for-4.16-console-waiter-logic&id=c162d5b4338d72deed61aa65ed0f2f4ba2bbc8ab > > Best Regards, > Petr > >>> + printk_safe_exit_irqrestore(flags); >>> + /* Note, if waiter is set, logbuf_lock is not held */ >>> + return; >>> + } >>> + >>> console_locked = 0; >>> /* Release the exclusive_console once it is used */ >>> >> >> -- >> Thanks, >> Byungchul > -- Thanks, Byungchul ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 1/2] printk: Add console owner and waiter logic to load balance console writes 2018-01-18 1:53 ` Byungchul Park @ 2018-01-18 1:57 ` Byungchul Park 2018-01-18 2:19 ` Steven Rostedt 1 sibling, 0 replies; 140+ messages in thread From: Byungchul Park @ 2018-01-18 1:57 UTC (permalink / raw) To: Petr Mladek Cc: Steven Rostedt, Sergey Senozhatsky, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Sergey Senozhatsky, Tejun Heo, Pavel Machek, linux-kernel, kernel-team On 1/18/2018 10:53 AM, Byungchul Park wrote: > Hello, > > This is a thing simulating a wait for an event e.g. > wait_for_completion() doing spinning instead of sleep, rather > than a spinlock. I mean: > > This context > ------------ > while (READ_ONCE(console_waiter)) /* Wait for the event */ > cpu_relax(); > > Another context > --------------- > WRITE_ONCE(console_waiter, false); /* Event */ > > That's why I said this's the exact case of cross-release. Anyway > without cross-release, we usually use typical acquire/release > pairs to cover a wait for an event in the following way: > > A context > --------- > lock_map_acquire(wait); /* Or lock_map_acquire_read(wait) */ > /* Read one is better though.. */ > > /* A section, we suspect, a wait for an event might happen. */ > ... > lock_map_release(wait); > > > The place actually doing the wait > --------------------------------- > lock_map_acquire(wait); > lock_map_acquire(wait); ^ lock_map_release(wait); -- Thanks, Byungchul ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 1/2] printk: Add console owner and waiter logic to load balance console writes 2018-01-18 1:53 ` Byungchul Park 2018-01-18 1:57 ` Byungchul Park @ 2018-01-18 2:19 ` Steven Rostedt 2018-01-18 4:01 ` Byungchul Park 1 sibling, 1 reply; 140+ messages in thread From: Steven Rostedt @ 2018-01-18 2:19 UTC (permalink / raw) To: Byungchul Park Cc: Petr Mladek, Sergey Senozhatsky, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Sergey Senozhatsky, Tejun Heo, Pavel Machek, linux-kernel, kernel-team On Thu, 18 Jan 2018 10:53:37 +0900 Byungchul Park <byungchul.park@lge.com> wrote: > Hello, > > This is a thing simulating a wait for an event e.g. > wait_for_completion() doing spinning instead of sleep, rather > than a spinlock. I mean: > > This context > ------------ > while (READ_ONCE(console_waiter)) /* Wait for the event */ > cpu_relax(); > > Another context > --------------- > WRITE_ONCE(console_waiter, false); /* Event */ I disagree. It is like a spinlock. You can say a spinlock() that is blocked is also waiting for an event. That event being the owner does a spin_unlock(). > > That's why I said this's the exact case of cross-release. Anyway > without cross-release, we usually use typical acquire/release > pairs to cover a wait for an event in the following way: > > A context > --------- > lock_map_acquire(wait); /* Or lock_map_acquire_read(wait) */ > /* Read one is better though.. */ > > /* A section, we suspect, a wait for an event might happen. */ > ... > lock_map_release(wait); > > > The place actually doing the wait > --------------------------------- > lock_map_acquire(wait); > lock_map_acquire(wait); > > wait_for_event(wait); /* Actually do the wait */ > > You can see a simple example of how to use them by searching > kernel/cpu.c with "lock_acquire" and "wait_for_completion". > > However, as I said, if you suspect that cpu_relax() includes > the wait, then it's ok to leave it. Otherwise, I think it > would be better to change it in the way I showed you above. I find your way confusing. I'm simulating a spinlock not a wait for completion. A wait for completion usually initiates something then waits for it to complete. This is trying to get into a critical area but another task is currently in it. It's simulating a spinlock as far as I can see. -- Steve ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 1/2] printk: Add console owner and waiter logic to load balance console writes 2018-01-18 2:19 ` Steven Rostedt @ 2018-01-18 4:01 ` Byungchul Park 2018-01-18 15:21 ` Steven Rostedt 0 siblings, 1 reply; 140+ messages in thread From: Byungchul Park @ 2018-01-18 4:01 UTC (permalink / raw) To: Steven Rostedt Cc: Petr Mladek, Sergey Senozhatsky, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Sergey Senozhatsky, Tejun Heo, Pavel Machek, linux-kernel, kernel-team On 1/18/2018 11:19 AM, Steven Rostedt wrote: > On Thu, 18 Jan 2018 10:53:37 +0900 > Byungchul Park <byungchul.park@lge.com> wrote: > >> Hello, >> >> This is a thing simulating a wait for an event e.g. >> wait_for_completion() doing spinning instead of sleep, rather >> than a spinlock. I mean: >> >> This context >> ------------ >> while (READ_ONCE(console_waiter)) /* Wait for the event */ >> cpu_relax(); >> >> Another context >> --------------- >> WRITE_ONCE(console_waiter, false); /* Event */ > > I disagree. It is like a spinlock. You can say a spinlock() that is > blocked is also waiting for an event. That event being the owner does a > spin_unlock(). That's exactly what I was saying. Excuse me but, I don't understand what you want to say. Could you explain more? What do you disagree? >> >> That's why I said this's the exact case of cross-release. Anyway >> without cross-release, we usually use typical acquire/release >> pairs to cover a wait for an event in the following way: >> >> A context >> --------- >> lock_map_acquire(wait); /* Or lock_map_acquire_read(wait) */ >> /* Read one is better though.. */ >> >> /* A section, we suspect, a wait for an event might happen. */ >> ... >> lock_map_release(wait); >> >> >> The place actually doing the wait >> --------------------------------- >> lock_map_acquire(wait); >> lock_map_acquire(wait); >> >> wait_for_event(wait); /* Actually do the wait */ >> >> You can see a simple example of how to use them by searching >> kernel/cpu.c with "lock_acquire" and "wait_for_completion". >> >> However, as I said, if you suspect that cpu_relax() includes >> the wait, then it's ok to leave it. Otherwise, I think it >> would be better to change it in the way I showed you above. > > I find your way confusing. I'm simulating a spinlock not a wait for > completion. A wait for completion usually initiates something then I used the word, *event* instead of *completion*. wait_for_completion() and complete() are just an example of a pair of waiter and event. Lock and unlock can also be another example, too. Important thing is that who waits and who triggers the event. Using the pair, we can achieve various things, for examples: 1. Synchronization like wait_for_completion() does. 2. Control exclusively entering into a critical area. 3. Whatever. > waits for it to complete. This is trying to get into a critical area > but another task is currently in it. It's simulating a spinlock as far > as I can see. Anyway it's an example of "waiter for an event, and the event". JFYI, spinning or sleeping does not matter. Those are just methods to achieve a wait. I know you're not talking about this though. It's JFYI. -- Thanks, Byungchul ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 1/2] printk: Add console owner and waiter logic to load balance console writes 2018-01-18 4:01 ` Byungchul Park @ 2018-01-18 15:21 ` Steven Rostedt 2018-01-19 2:37 ` Byungchul Park 0 siblings, 1 reply; 140+ messages in thread From: Steven Rostedt @ 2018-01-18 15:21 UTC (permalink / raw) To: Byungchul Park Cc: Petr Mladek, Sergey Senozhatsky, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Sergey Senozhatsky, Tejun Heo, Pavel Machek, linux-kernel, kernel-team On Thu, 18 Jan 2018 13:01:46 +0900 Byungchul Park <byungchul.park@lge.com> wrote: > > I disagree. It is like a spinlock. You can say a spinlock() that is > > blocked is also waiting for an event. That event being the owner does a > > spin_unlock(). > > That's exactly what I was saying. Excuse me but, I don't understand > what you want to say. Could you explain more? What do you disagree? I guess I'm confused at what you are asking for then. > > I find your way confusing. I'm simulating a spinlock not a wait for > > completion. A wait for completion usually initiates something then > > I used the word, *event* instead of *completion*. wait_for_completion() > and complete() are just an example of a pair of waiter and event. > Lock and unlock can also be another example, too. > > Important thing is that who waits and who triggers the event. Using the > pair, we can achieve various things, for examples: > > 1. Synchronization like wait_for_completion() does. > 2. Control exclusively entering into a critical area. > 3. Whatever. > > > waits for it to complete. This is trying to get into a critical area > > but another task is currently in it. It's simulating a spinlock as far > > as I can see. > > Anyway it's an example of "waiter for an event, and the event". > > JFYI, spinning or sleeping does not matter. Those are just methods to > achieve a wait. I know you're not talking about this though. It's JFYI. OK, if it is just FYI. -- Steve ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 1/2] printk: Add console owner and waiter logic to load balance console writes 2018-01-18 15:21 ` Steven Rostedt @ 2018-01-19 2:37 ` Byungchul Park 2018-01-19 3:27 ` Steven Rostedt 0 siblings, 1 reply; 140+ messages in thread From: Byungchul Park @ 2018-01-19 2:37 UTC (permalink / raw) To: Steven Rostedt Cc: Petr Mladek, Sergey Senozhatsky, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Sergey Senozhatsky, Tejun Heo, Pavel Machek, linux-kernel, kernel-team On 1/19/2018 12:21 AM, Steven Rostedt wrote: > On Thu, 18 Jan 2018 13:01:46 +0900 > Byungchul Park <byungchul.park@lge.com> wrote: > >>> I disagree. It is like a spinlock. You can say a spinlock() that is >>> blocked is also waiting for an event. That event being the owner does a >>> spin_unlock(). >> >> That's exactly what I was saying. Excuse me but, I don't understand >> what you want to say. Could you explain more? What do you disagree? > > I guess I'm confused at what you are asking for then. Sorry for not enough explanation. What I asked you for is: 1. Relocate acquire()s/release()s. 2. So make it simpler and remove unnecessary one. 3. So make it look like the following form, because it's a thing simulating "wait and event". A context --------- lock_map_acquire(wait); /* Or lock_map_acquire_read(wait) */ /* "Read" one is better though.. */ /* A section, we suspect a wait for an event might happen. */ ... lock_map_release(wait); The place actually doing the wait --------------------------------- lock_map_acquire(wait); lock_map_release(wait); wait_for_event(wait); /* Actually do the wait */ Honestly, you used acquire()s/release()s as if they are cross- release stuff which mainly handles general waits and events, not only things doing "acquire -> critical area -> release". But that's not in the mainline at the moment. >>> I find your way confusing. I'm simulating a spinlock not a wait for >>> completion. A wait for completion usually initiates something then >> >> I used the word, *event* instead of *completion*. wait_for_completion() >> and complete() are just an example of a pair of waiter and event. >> Lock and unlock can also be another example, too. >> >> Important thing is that who waits and who triggers the event. Using the >> pair, we can achieve various things, for examples: >> >> 1. Synchronization like wait_for_completion() does. >> 2. Control exclusively entering into a critical area. >> 3. Whatever. >> >>> waits for it to complete. This is trying to get into a critical area >>> but another task is currently in it. It's simulating a spinlock as far >>> as I can see. >> >> Anyway it's an example of "waiter for an event, and the event". >> >> JFYI, spinning or sleeping does not matter. Those are just methods to ^ whether spining or sleeping doesn't matter. >> achieve a wait. I know you're not talking about this though. It's JFYI. > > OK, if it is just FYI. Actually, the last paragraph is JFYI tho. > -- Steve > > > -- Thanks, Byungchul ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 1/2] printk: Add console owner and waiter logic to load balance console writes 2018-01-19 2:37 ` Byungchul Park @ 2018-01-19 3:27 ` Steven Rostedt 2018-01-22 2:31 ` Byungchul Park 0 siblings, 1 reply; 140+ messages in thread From: Steven Rostedt @ 2018-01-19 3:27 UTC (permalink / raw) To: Byungchul Park Cc: Petr Mladek, Sergey Senozhatsky, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Sergey Senozhatsky, Tejun Heo, Pavel Machek, linux-kernel, kernel-team On Fri, 19 Jan 2018 11:37:13 +0900 Byungchul Park <byungchul.park@lge.com> wrote: > On 1/19/2018 12:21 AM, Steven Rostedt wrote: > > On Thu, 18 Jan 2018 13:01:46 +0900 > > Byungchul Park <byungchul.park@lge.com> wrote: > > > >>> I disagree. It is like a spinlock. You can say a spinlock() that is > >>> blocked is also waiting for an event. That event being the owner does a > >>> spin_unlock(). > >> > >> That's exactly what I was saying. Excuse me but, I don't understand > >> what you want to say. Could you explain more? What do you disagree? > > > > I guess I'm confused at what you are asking for then. > > Sorry for not enough explanation. What I asked you for is: > > 1. Relocate acquire()s/release()s. > 2. So make it simpler and remove unnecessary one. > 3. So make it look like the following form, > because it's a thing simulating "wait and event". > > A context > --------- > lock_map_acquire(wait); /* Or lock_map_acquire_read(wait) */ > /* "Read" one is better though.. */ why? I'm assuming you are talking about adding this to the current owner off the console_owner? This is a mutually exclusive section, no parallel access. Why the Read? > > /* A section, we suspect a wait for an event might happen. */ > ... > > lock_map_release(wait); > > The place actually doing the wait > --------------------------------- > lock_map_acquire(wait); > lock_map_release(wait); > > wait_for_event(wait); /* Actually do the wait */ > > Honestly, you used acquire()s/release()s as if they are cross- > release stuff which mainly handles general waits and events, > not only things doing "acquire -> critical area -> release". > But that's not in the mainline at the moment. Maybe it is more like that. Because, the thing I'm doing is passing off a semaphore ownership to the waiter. >From a previous email: > > + if (spin) { > > + /* We spin waiting for the owner to release us */ > > + spin_acquire(&console_owner_dep_map, 0, 0, _THIS_IP_); > > + /* Owner will clear console_waiter on hand off */ > > + while (READ_ONCE(console_waiter)) > > + cpu_relax(); > > + > > + spin_release(&console_owner_dep_map, 1, _THIS_IP_); > > Why don't you move this over "while (READ_ONCE(console_waiter))" and > right after acquire()? > > As I said last time, only acquisitions between acquire() and release() > are meaningful. Are you taking care of acquisitions within cpu_relax()? > If so, leave it. There is no acquisitions between acquire and release. To get to "if (spin)" the acquire had to already been done. If it was released, this spinner is now the new "owner". There's no race with anyone else. But it doesn't technically have it till console_waiter is set to NULL. Why would we call release() before that? Or maybe I'm missing something. Or are you just saying that it doesn't matter if it is before or after the while() loop, to just put it before? Does it really matter? -- Steve ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 1/2] printk: Add console owner and waiter logic to load balance console writes 2018-01-19 3:27 ` Steven Rostedt @ 2018-01-22 2:31 ` Byungchul Park 0 siblings, 0 replies; 140+ messages in thread From: Byungchul Park @ 2018-01-22 2:31 UTC (permalink / raw) To: Steven Rostedt Cc: Petr Mladek, Sergey Senozhatsky, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Sergey Senozhatsky, Tejun Heo, Pavel Machek, linux-kernel, kernel-team On 1/19/2018 12:27 PM, Steven Rostedt wrote: > On Fri, 19 Jan 2018 11:37:13 +0900 > Byungchul Park <byungchul.park@lge.com> wrote: > >> On 1/19/2018 12:21 AM, Steven Rostedt wrote: >>> On Thu, 18 Jan 2018 13:01:46 +0900 >>> Byungchul Park <byungchul.park@lge.com> wrote: >>> >>>>> I disagree. It is like a spinlock. You can say a spinlock() that is >>>>> blocked is also waiting for an event. That event being the owner does a >>>>> spin_unlock(). >>>> >>>> That's exactly what I was saying. Excuse me but, I don't understand >>>> what you want to say. Could you explain more? What do you disagree? >>> >>> I guess I'm confused at what you are asking for then. >> >> Sorry for not enough explanation. What I asked you for is: >> >> 1. Relocate acquire()s/release()s. >> 2. So make it simpler and remove unnecessary one. >> 3. So make it look like the following form, >> because it's a thing simulating "wait and event". >> >> A context >> --------- >> lock_map_acquire(wait); /* Or lock_map_acquire_read(wait) */ >> /* "Read" one is better though.. */ > > why? I'm assuming you are talking about adding this to the current It was about console_unlock()'s body that is: + /* The waiter may spin on us after setting console_owner */ + spin_acquire(&console_owner_dep_map, 0, 0, _THIS_IP_); ^^^^^^^^^^^^ + stop_critical_timings(); /* don't trace print latency */ call_console_drivers(ext_text, ext_len, text, len); start_critical_timings(); + + raw_spin_lock(&console_owner_lock); + waiter = READ_ONCE(console_waiter); + console_owner = NULL; + raw_spin_unlock(&console_owner_lock); + + /* + * If there is a waiter waiting for us, then pass the + * rest of the work load over to that waiter. + */ + if (waiter) + break; + + /* There was no waiter, and nothing will spin on us here */ + spin_release(&console_owner_dep_map, 1, _THIS_IP_); ^^^^^^^^^^^^ I recommand to move this over the "if" statament. + printk_safe_exit_irqrestore(flags); if (do_cond_resched) cond_resched(); } + + /* + * If there is an active waiter waiting on the console_lock. + * Pass off the printing to the waiter, and the waiter + * will continue printing on its CPU, and when all writing + * has finished, the last printer will wake up klogd. + */ + if (waiter) { + WRITE_ONCE(console_waiter, false); + /* The waiter is now free to continue */ + spin_release(&console_owner_dep_map, 1, _THIS_IP_); ^^^^^^^^^^^^ I recommand to remove this. > owner off the console_owner? This is a mutually exclusive section, no > parallel access. Why the Read? Not much matter whether to use the read version or not. Let me explain it more since you asked. (I don't stongly insist to use the read version tho.) For example: A context (context A) --------------------- lock_map_acquire(wait); /* Or lock_map_acquire_read(wait) */ /* "Read" one is better though.. */ /* A section, we suspect a wait for the event might happen. */ ... lock_map_release(wait); trigger the event; The place actually doing the wait (context B) --------------------------------------------- lock_map_acquire(wait); lock_map_release(wait); wait_for_event(wait); /* Actually do the wait */ The acquire() in context A is not a real acquisition but only for detecting if a wait is in the section, which means that should not interact with another pseudo acqusition but only with real waits. lock_map_acquire_read() makes it done as we expect. That's why I said 'read' one is better. But it's ok to use normal(write) one. (I'm not sure if Peterz finished making the 'read' work well, tho.) >> >> /* A section, we suspect a wait for an event might happen. */ >> ... >> >> lock_map_release(wait); >> >> The place actually doing the wait >> --------------------------------- >> lock_map_acquire(wait); >> lock_map_release(wait); >> >> wait_for_event(wait); /* Actually do the wait */ >> >> Honestly, you used acquire()s/release()s as if they are cross- >> release stuff which mainly handles general waits and events, >> not only things doing "acquire -> critical area -> release". >> But that's not in the mainline at the moment. > > Maybe it is more like that. Because, the thing I'm doing is passing off > a semaphore ownership to the waiter. > > From a previous email: > >>> + if (spin) { >>> + /* We spin waiting for the owner to release us */ >>> + spin_acquire(&console_owner_dep_map, 0, 0, _THIS_IP_); >>> + /* Owner will clear console_waiter on hand off */ >>> + while (READ_ONCE(console_waiter)) >>> + cpu_relax(); >>> + >>> + spin_release(&console_owner_dep_map, 1, _THIS_IP_); >> >> Why don't you move this over "while (READ_ONCE(console_waiter))" and >> right after acquire()? >> >> As I said last time, only acquisitions between acquire() and release() >> are meaningful. Are you taking care of acquisitions within cpu_relax()? >> If so, leave it. > > There is no acquisitions between acquire and release. To get to > "if (spin)" the acquire had to already been done. If it was released, > this spinner is now the new "owner". There's no race with anyone else. > But it doesn't technically have it till console_waiter is set to NULL. > Why would we call release() before that? Or maybe I'm missing something. > > Or are you just saying that it doesn't matter if it is before or after > the while() loop, to just put it before? Does it really matter? It doesn't matter. As I said, there's logically no problem on it. Leave the code if you want to locate those that way. I just started to mention it becasue some lines can be removed with the code a bit fixed. > > -- Steve > -- Thanks, Byungchul ^ permalink raw reply [flat|nested] 140+ messages in thread
* [PATCH v5 2/2] printk: Hide console waiter logic into helpers 2018-01-10 13:24 [PATCH v5 0/2] printk: Console owner and waiter logic cleanup Petr Mladek 2018-01-10 13:24 ` [PATCH v5 1/2] printk: Add console owner and waiter logic to load balance console writes Petr Mladek @ 2018-01-10 13:24 ` Petr Mladek 2018-01-10 17:52 ` Steven Rostedt 2018-01-10 14:05 ` [PATCH v5 0/2] printk: Console owner and waiter logic cleanup Tejun Heo 2 siblings, 1 reply; 140+ messages in thread From: Petr Mladek @ 2018-01-10 13:24 UTC (permalink / raw) To: Steven Rostedt, Sergey Senozhatsky Cc: akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Sergey Senozhatsky, Tejun Heo, Pavel Machek, linux-kernel, Petr Mladek The commit ("printk: Add console owner and waiter logic to load balance console writes") made vprintk_emit() and console_unlock() even more complicated. This patch extracts the new code into 3 helper functions. They should help to keep it rather self-contained. It will be easier to use and maintain. This patch just shuffles the existing code. It does not change the functionality. Signed-off-by: Petr Mladek <pmladek@suse.com> --- kernel/printk/printk.c | 242 +++++++++++++++++++++++++++++-------------------- 1 file changed, 145 insertions(+), 97 deletions(-) diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index 7e6459abba43..6217c280e6c1 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -86,15 +86,8 @@ EXPORT_SYMBOL_GPL(console_drivers); static struct lockdep_map console_lock_dep_map = { .name = "console_lock" }; -static struct lockdep_map console_owner_dep_map = { - .name = "console_owner" -}; #endif -static DEFINE_RAW_SPINLOCK(console_owner_lock); -static struct task_struct *console_owner; -static bool console_waiter; - enum devkmsg_log_bits { __DEVKMSG_LOG_BIT_ON = 0, __DEVKMSG_LOG_BIT_OFF, @@ -1551,6 +1544,143 @@ SYSCALL_DEFINE3(syslog, int, type, char __user *, buf, int, len) } /* + * Special console_lock variants that help to reduce the risk of soft-lockups. + * They allow to pass console_lock to another printk() call using a busy wait. + */ + +#ifdef CONFIG_LOCKDEP +static struct lockdep_map console_owner_dep_map = { + .name = "console_owner" +}; +#endif + +static DEFINE_RAW_SPINLOCK(console_owner_lock); +static struct task_struct *console_owner; +static bool console_waiter; + +/** + * console_lock_spinning_enable - mark beginning of code where another + * thread might safely busy wait + * + * This might be called in sections where the current console_lock owner + * cannot sleep. It is a signal that another thread might start busy + * waiting for console_lock. + */ +static void console_lock_spinning_enable(void) +{ + raw_spin_lock(&console_owner_lock); + console_owner = current; + raw_spin_unlock(&console_owner_lock); + + /* The waiter may spin on us after setting console_owner */ + spin_acquire(&console_owner_dep_map, 0, 0, _THIS_IP_); +} + +/** + * console_lock_spinning_disable_and_check - mark end of code where another + * thread was able to busy wait and check if there is a waiter + * + * This is called at the end of section when spinning was enabled by + * console_lock_spinning_enable(). It has two functions. First, it + * is a signal that it is not longer safe to start busy waiting + * for the lock. Second, it checks if there is a busy waiter and + * passes the lock rights to her. + * + * Important: Callers lose the lock if there was the busy waiter. + * They must not longer touch items synchornized by console_lock + * in this case. + * + * Return: 1 if the lock rights were passed, 0 othrewise. + */ +static int console_lock_spinning_disable_and_check(void) +{ + int waiter; + + raw_spin_lock(&console_owner_lock); + waiter = READ_ONCE(console_waiter); + console_owner = NULL; + raw_spin_unlock(&console_owner_lock); + + if (!waiter) { + spin_release(&console_owner_dep_map, 1, _THIS_IP_); + return 0; + } + + /* The waiter is now free to continue */ + WRITE_ONCE(console_waiter, false); + + spin_release(&console_owner_dep_map, 1, _THIS_IP_); + + /* + * Hand off console_lock to waiter. The waiter will perform + * the up(). After this, the waiter is the console_lock owner. + */ + mutex_release(&console_lock_dep_map, 1, _THIS_IP_); + return 1; +} + +/** + * console_trylock_spinning - try to get console_lock by busy waiting + * + * This allows to busy wait for the console_lock when the current + * owner is running in a special marked sections. It means that + * the current owner is running and cannot reschedule until it + * is ready to loose the lock. + * + * Return: 1 if we got the lock, 0 othrewise + */ +static int console_trylock_spinning(void) +{ + struct task_struct *owner = NULL; + bool waiter; + bool spin = false; + unsigned long flags; + + printk_safe_enter_irqsave(flags); + + raw_spin_lock(&console_owner_lock); + owner = READ_ONCE(console_owner); + waiter = READ_ONCE(console_waiter); + if (!waiter && owner && owner != current) { + WRITE_ONCE(console_waiter, true); + spin = true; + } + raw_spin_unlock(&console_owner_lock); + + /* + * If there is an active printk() writing to the + * consoles, instead of having it write our data too, + * see if we can offload that load from the active + * printer, and do some printing ourselves. + * Go into a spin only if there isn't already a waiter + * spinning, and there is an active printer, and + * that active printer isn't us (recursive printk?). + */ + if (!spin) { + printk_safe_exit_irqrestore(flags); + return 0; + } + + /* We spin waiting for the owner to release us */ + spin_acquire(&console_owner_dep_map, 0, 0, _THIS_IP_); + /* Owner will clear console_waiter on hand off */ + while (READ_ONCE(console_waiter)) + cpu_relax(); + spin_release(&console_owner_dep_map, 1, _THIS_IP_); + + printk_safe_exit_irqrestore(flags); + /* + * The owner passed the console lock to us. + * Since we did not spin on console lock, annotate + * this as a trylock. Otherwise lockdep will + * complain. + */ + mutex_acquire(&console_lock_dep_map, 0, 1, _THIS_IP_); + + return 1; +} + +/* * Call the console drivers, asking them to write out * log_buf[start] to log_buf[end - 1]. * The console_lock must be held. @@ -1760,56 +1890,8 @@ asmlinkage int vprintk_emit(int facility, int level, * semaphore. The release will print out buffers and wake up * /dev/kmsg and syslog() users. */ - if (console_trylock()) { + if (console_trylock() || console_trylock_spinning()) console_unlock(); - } else { - struct task_struct *owner = NULL; - bool waiter; - bool spin = false; - - printk_safe_enter_irqsave(flags); - - raw_spin_lock(&console_owner_lock); - owner = READ_ONCE(console_owner); - waiter = READ_ONCE(console_waiter); - if (!waiter && owner && owner != current) { - WRITE_ONCE(console_waiter, true); - spin = true; - } - raw_spin_unlock(&console_owner_lock); - - /* - * If there is an active printk() writing to the - * consoles, instead of having it write our data too, - * see if we can offload that load from the active - * printer, and do some printing ourselves. - * Go into a spin only if there isn't already a waiter - * spinning, and there is an active printer, and - * that active printer isn't us (recursive printk?). - */ - if (spin) { - /* We spin waiting for the owner to release us */ - spin_acquire(&console_owner_dep_map, 0, 0, _THIS_IP_); - /* Owner will clear console_waiter on hand off */ - while (READ_ONCE(console_waiter)) - cpu_relax(); - - spin_release(&console_owner_dep_map, 1, _THIS_IP_); - printk_safe_exit_irqrestore(flags); - - /* - * The owner passed the console lock to us. - * Since we did not spin on console lock, annotate - * this as a trylock. Otherwise lockdep will - * complain. - */ - mutex_acquire(&console_lock_dep_map, 0, 1, _THIS_IP_); - console_unlock(); - printk_safe_enter_irqsave(flags); - } - printk_safe_exit_irqrestore(flags); - - } } return printed_len; @@ -1910,6 +1992,8 @@ static ssize_t msg_print_ext_header(char *buf, size_t size, static ssize_t msg_print_ext_body(char *buf, size_t size, char *dict, size_t dict_len, char *text, size_t text_len) { return 0; } +static void console_lock_spinning_enable(void) { } +static int console_lock_spinning_disable_and_check(void) { return 0; } static void call_console_drivers(const char *ext_text, size_t ext_len, const char *text, size_t len) {} static size_t msg_print_text(const struct printk_log *msg, @@ -2196,7 +2280,6 @@ void console_unlock(void) static u64 seen_seq; unsigned long flags; bool wake_klogd = false; - bool waiter = false; bool do_cond_resched, retry; if (console_suspended) { @@ -2291,31 +2374,16 @@ void console_unlock(void) * finish. This task can not be preempted if there is a * waiter waiting to take over. */ - raw_spin_lock(&console_owner_lock); - console_owner = current; - raw_spin_unlock(&console_owner_lock); - - /* The waiter may spin on us after setting console_owner */ - spin_acquire(&console_owner_dep_map, 0, 0, _THIS_IP_); + console_lock_spinning_enable(); stop_critical_timings(); /* don't trace print latency */ call_console_drivers(ext_text, ext_len, text, len); start_critical_timings(); - raw_spin_lock(&console_owner_lock); - waiter = READ_ONCE(console_waiter); - console_owner = NULL; - raw_spin_unlock(&console_owner_lock); - - /* - * If there is a waiter waiting for us, then pass the - * rest of the work load over to that waiter. - */ - if (waiter) - break; - - /* There was no waiter, and nothing will spin on us here */ - spin_release(&console_owner_dep_map, 1, _THIS_IP_); + if (console_lock_spinning_disable_and_check()) { + printk_safe_exit_irqrestore(flags); + return; + } printk_safe_exit_irqrestore(flags); @@ -2323,26 +2391,6 @@ void console_unlock(void) cond_resched(); } - /* - * If there is an active waiter waiting on the console_lock. - * Pass off the printing to the waiter, and the waiter - * will continue printing on its CPU, and when all writing - * has finished, the last printer will wake up klogd. - */ - if (waiter) { - WRITE_ONCE(console_waiter, false); - /* The waiter is now free to continue */ - spin_release(&console_owner_dep_map, 1, _THIS_IP_); - /* - * Hand off console_lock to waiter. The waiter will perform - * the up(). After this, the waiter is the console_lock owner. - */ - mutex_release(&console_lock_dep_map, 1, _THIS_IP_); - printk_safe_exit_irqrestore(flags); - /* Note, if waiter is set, logbuf_lock is not held */ - return; - } - console_locked = 0; /* Release the exclusive_console once it is used */ -- 2.13.6 ^ permalink raw reply related [flat|nested] 140+ messages in thread
* Re: [PATCH v5 2/2] printk: Hide console waiter logic into helpers 2018-01-10 13:24 ` [PATCH v5 2/2] printk: Hide console waiter logic into helpers Petr Mladek @ 2018-01-10 17:52 ` Steven Rostedt 2018-01-11 12:03 ` Petr Mladek 0 siblings, 1 reply; 140+ messages in thread From: Steven Rostedt @ 2018-01-10 17:52 UTC (permalink / raw) To: Petr Mladek Cc: Sergey Senozhatsky, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Sergey Senozhatsky, Tejun Heo, Pavel Machek, linux-kernel On Wed, 10 Jan 2018 14:24:18 +0100 Petr Mladek <pmladek@suse.com> wrote: > The commit ("printk: Add console owner and waiter logic to load balance > console writes") made vprintk_emit() and console_unlock() even more > complicated. > > This patch extracts the new code into 3 helper functions. They should > help to keep it rather self-contained. It will be easier to use and > maintain. > > This patch just shuffles the existing code. It does not change > the functionality. > > Signed-off-by: Petr Mladek <pmladek@suse.com> > --- > kernel/printk/printk.c | 242 +++++++++++++++++++++++++++++-------------------- > 1 file changed, 145 insertions(+), 97 deletions(-) > > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c > index 7e6459abba43..6217c280e6c1 100644 > --- a/kernel/printk/printk.c > +++ b/kernel/printk/printk.c > @@ -86,15 +86,8 @@ EXPORT_SYMBOL_GPL(console_drivers); > static struct lockdep_map console_lock_dep_map = { > .name = "console_lock" > }; > -static struct lockdep_map console_owner_dep_map = { > - .name = "console_owner" > -}; > #endif > > -static DEFINE_RAW_SPINLOCK(console_owner_lock); > -static struct task_struct *console_owner; > -static bool console_waiter; > - > enum devkmsg_log_bits { > __DEVKMSG_LOG_BIT_ON = 0, > __DEVKMSG_LOG_BIT_OFF, > @@ -1551,6 +1544,143 @@ SYSCALL_DEFINE3(syslog, int, type, char __user *, buf, int, len) > } > > /* > + * Special console_lock variants that help to reduce the risk of soft-lockups. > + * They allow to pass console_lock to another printk() call using a busy wait. > + */ > + > +#ifdef CONFIG_LOCKDEP > +static struct lockdep_map console_owner_dep_map = { > + .name = "console_owner" > +}; > +#endif > + > +static DEFINE_RAW_SPINLOCK(console_owner_lock); > +static struct task_struct *console_owner; > +static bool console_waiter; > + > +/** > + * console_lock_spinning_enable - mark beginning of code where another > + * thread might safely busy wait > + * > + * This might be called in sections where the current console_lock owner "might be"? It has to be called in sections where the current console_lock owner can not sleep. It's basically saying "console lock is now acting like a spinlock". > + * cannot sleep. It is a signal that another thread might start busy > + * waiting for console_lock. > + */ > +static void console_lock_spinning_enable(void) > +{ > + raw_spin_lock(&console_owner_lock); > + console_owner = current; > + raw_spin_unlock(&console_owner_lock); > + > + /* The waiter may spin on us after setting console_owner */ > + spin_acquire(&console_owner_dep_map, 0, 0, _THIS_IP_); > +} > + > +/** > + * console_lock_spinning_disable_and_check - mark end of code where another > + * thread was able to busy wait and check if there is a waiter > + * > + * This is called at the end of section when spinning was enabled by > + * console_lock_spinning_enable(). It has two functions. First, it "This is called at the end of the section where spinning is allowed." > + * is a signal that it is not longer safe to start busy waiting "it is no longer safe" > + * for the lock. Second, it checks if there is a busy waiter and > + * passes the lock rights to her. > + * > + * Important: Callers lose the lock if there was the busy waiter. > + * They must not longer touch items synchornized by console_lock "They must not touch items ..." > + * in this case. > + * > + * Return: 1 if the lock rights were passed, 0 othrewise. "otherwise" > + */ > +static int console_lock_spinning_disable_and_check(void) > +{ > + int waiter; > + > + raw_spin_lock(&console_owner_lock); > + waiter = READ_ONCE(console_waiter); > + console_owner = NULL; > + raw_spin_unlock(&console_owner_lock); > + > + if (!waiter) { > + spin_release(&console_owner_dep_map, 1, _THIS_IP_); > + return 0; > + } > + > + /* The waiter is now free to continue */ > + WRITE_ONCE(console_waiter, false); > + > + spin_release(&console_owner_dep_map, 1, _THIS_IP_); > + > + /* > + * Hand off console_lock to waiter. The waiter will perform > + * the up(). After this, the waiter is the console_lock owner. > + */ > + mutex_release(&console_lock_dep_map, 1, _THIS_IP_); > + return 1; > +} > + > +/** > + * console_trylock_spinning - try to get console_lock by busy waiting > + * > + * This allows to busy wait for the console_lock when the current > + * owner is running in a special marked sections. It means that > + * the current owner is running and cannot reschedule until it > + * is ready to loose the lock. > + * > + * Return: 1 if we got the lock, 0 othrewise > + */ > +static int console_trylock_spinning(void) > +{ > + struct task_struct *owner = NULL; > + bool waiter; > + bool spin = false; > + unsigned long flags; Can we add here: if (console_trylock()) return 1; And then we can simplify the below from: if (console_trylock() || console_trylock_spinning()) to just if (console_trylock_spinning()) -- Steve > + > + printk_safe_enter_irqsave(flags); > + > + raw_spin_lock(&console_owner_lock); > + owner = READ_ONCE(console_owner); > + waiter = READ_ONCE(console_waiter); > + if (!waiter && owner && owner != current) { > + WRITE_ONCE(console_waiter, true); > + spin = true; > + } > + raw_spin_unlock(&console_owner_lock); > + > + /* > + * If there is an active printk() writing to the > + * consoles, instead of having it write our data too, > + * see if we can offload that load from the active > + * printer, and do some printing ourselves. > + * Go into a spin only if there isn't already a waiter > + * spinning, and there is an active printer, and > + * that active printer isn't us (recursive printk?). > + */ > + if (!spin) { > + printk_safe_exit_irqrestore(flags); > + return 0; > + } > + > + /* We spin waiting for the owner to release us */ > + spin_acquire(&console_owner_dep_map, 0, 0, _THIS_IP_); > + /* Owner will clear console_waiter on hand off */ > + while (READ_ONCE(console_waiter)) > + cpu_relax(); > + spin_release(&console_owner_dep_map, 1, _THIS_IP_); > + > + printk_safe_exit_irqrestore(flags); > + /* > + * The owner passed the console lock to us. > + * Since we did not spin on console lock, annotate > + * this as a trylock. Otherwise lockdep will > + * complain. > + */ > + mutex_acquire(&console_lock_dep_map, 0, 1, _THIS_IP_); > + > + return 1; > +} > + > +/* > * Call the console drivers, asking them to write out > * log_buf[start] to log_buf[end - 1]. > * The console_lock must be held. > @@ -1760,56 +1890,8 @@ asmlinkage int vprintk_emit(int facility, int level, > * semaphore. The release will print out buffers and wake up > * /dev/kmsg and syslog() users. > */ > - if (console_trylock()) { > + if (console_trylock() || console_trylock_spinning()) > console_unlock(); > ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 2/2] printk: Hide console waiter logic into helpers 2018-01-10 17:52 ` Steven Rostedt @ 2018-01-11 12:03 ` Petr Mladek 2018-01-12 15:37 ` Steven Rostedt 0 siblings, 1 reply; 140+ messages in thread From: Petr Mladek @ 2018-01-11 12:03 UTC (permalink / raw) To: Steven Rostedt Cc: Sergey Senozhatsky, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Sergey Senozhatsky, Tejun Heo, Pavel Machek, linux-kernel On Wed 2018-01-10 12:52:20, Steven Rostedt wrote: > On Wed, 10 Jan 2018 14:24:18 +0100 > Petr Mladek <pmladek@suse.com> wrote: > > > The commit ("printk: Add console owner and waiter logic to load balance > > console writes") made vprintk_emit() and console_unlock() even more > > complicated. > > > > This patch extracts the new code into 3 helper functions. They should > > help to keep it rather self-contained. It will be easier to use and > > maintain. > > > > This patch just shuffles the existing code. It does not change > > the functionality. > > > > Signed-off-by: Petr Mladek <pmladek@suse.com> > > --- > > kernel/printk/printk.c | 242 +++++++++++++++++++++++++++++-------------------- > > 1 file changed, 145 insertions(+), 97 deletions(-) > > > > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c > > index 7e6459abba43..6217c280e6c1 100644 > > --- a/kernel/printk/printk.c > > +++ b/kernel/printk/printk.c > > @@ -86,15 +86,8 @@ EXPORT_SYMBOL_GPL(console_drivers); > > static struct lockdep_map console_lock_dep_map = { > > .name = "console_lock" > > }; > > -static struct lockdep_map console_owner_dep_map = { > > - .name = "console_owner" > > -}; > > #endif > > > > -static DEFINE_RAW_SPINLOCK(console_owner_lock); > > -static struct task_struct *console_owner; > > -static bool console_waiter; > > - > > enum devkmsg_log_bits { > > __DEVKMSG_LOG_BIT_ON = 0, > > __DEVKMSG_LOG_BIT_OFF, > > @@ -1551,6 +1544,143 @@ SYSCALL_DEFINE3(syslog, int, type, char __user *, buf, int, len) > > } > > > > /* > > + * Special console_lock variants that help to reduce the risk of soft-lockups. > > + * They allow to pass console_lock to another printk() call using a busy wait. > > + */ > > + > > +#ifdef CONFIG_LOCKDEP > > +static struct lockdep_map console_owner_dep_map = { > > + .name = "console_owner" > > +}; > > +#endif > > + > > +static DEFINE_RAW_SPINLOCK(console_owner_lock); > > +static struct task_struct *console_owner; > > +static bool console_waiter; > > + > > +/** > > + * console_lock_spinning_enable - mark beginning of code where another > > + * thread might safely busy wait > > + * > > + * This might be called in sections where the current console_lock owner > > > "might be"? It has to be called in sections where the current > console_lock owner can not sleep. It's basically saying "console lock is > now acting like a spinlock". I am afraid that both explanations are confusing. Your one sounds like it must be called every time we enter non-preemptive context in console_unlock. What about the following? * This is basically saying that "console lock is now acting like * a spinlock". It can be called _only_ in sections where the current * console_lock owner could not sleep. Also it must be ready to hand * over the lock at the end of the section. > > + * cannot sleep. It is a signal that another thread might start busy > > + * waiting for console_lock. > > + */ All the other changes look good to me. I will use them in the next version. Best Regards, Petr ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 2/2] printk: Hide console waiter logic into helpers 2018-01-11 12:03 ` Petr Mladek @ 2018-01-12 15:37 ` Steven Rostedt 2018-01-12 16:08 ` Petr Mladek 0 siblings, 1 reply; 140+ messages in thread From: Steven Rostedt @ 2018-01-12 15:37 UTC (permalink / raw) To: Petr Mladek Cc: Sergey Senozhatsky, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Sergey Senozhatsky, Tejun Heo, Pavel Machek, linux-kernel On Thu, 11 Jan 2018 13:03:41 +0100 Petr Mladek <pmladek@suse.com> wrote: > > > +static DEFINE_RAW_SPINLOCK(console_owner_lock); > > > +static struct task_struct *console_owner; > > > +static bool console_waiter; > > > + > > > +/** > > > + * console_lock_spinning_enable - mark beginning of code where another > > > + * thread might safely busy wait > > > + * > > > + * This might be called in sections where the current console_lock owner > > > > > > "might be"? It has to be called in sections where the current > > console_lock owner can not sleep. It's basically saying "console lock is > > now acting like a spinlock". > > I am afraid that both explanations are confusing. Your one sounds like > it must be called every time we enter non-preemptive context in > console_unlock. What about the following? > > * This is basically saying that "console lock is now acting like > * a spinlock". It can be called _only_ in sections where the current > * console_lock owner could not sleep. Also it must be ready to hand > * over the lock at the end of the section. I would reword the above: * This basically converts console_lock into a spinlock. This marks * the section where the console_lock owner can not sleep, because * there may be a waiter spinning (like a spinlock). Also it must be * ready to hand over the lock at the end of the section. > > > > + * cannot sleep. It is a signal that another thread might start busy > > > + * waiting for console_lock. > > > + */ > > All the other changes look good to me. I will use them in the next version. Great. -- Steve ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 2/2] printk: Hide console waiter logic into helpers 2018-01-12 15:37 ` Steven Rostedt @ 2018-01-12 16:08 ` Petr Mladek 2018-01-12 16:36 ` Steven Rostedt 0 siblings, 1 reply; 140+ messages in thread From: Petr Mladek @ 2018-01-12 16:08 UTC (permalink / raw) To: Steven Rostedt Cc: Sergey Senozhatsky, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Sergey Senozhatsky, Tejun Heo, Pavel Machek, linux-kernel On Fri 2018-01-12 10:37:54, Steven Rostedt wrote: > On Thu, 11 Jan 2018 13:03:41 +0100 > Petr Mladek <pmladek@suse.com> wrote: > > All the other changes look good to me. I will use them in the next version. > > Great. Please, find below the updated version. If I get Ack at least from Steven and no nack's, I will put it into linux-next next week. >From f67f70d910d9cf310a7bc73e97bf14097d31b059 Mon Sep 17 00:00:00 2001 From: Petr Mladek <pmladek@suse.com> Date: Fri, 22 Dec 2017 18:58:46 +0100 Subject: [PATCH v6 2/4] printk: Hide console waiter logic into helpers The commit ("printk: Add console owner and waiter logic to load balance console writes") made vprintk_emit() and console_unlock() even more complicated. This patch extracts the new code into 3 helper functions. They should help to keep it rather self-contained. It will be easier to use and maintain. This patch just shuffles the existing code. It does not change the functionality. Signed-off-by: Petr Mladek <pmladek@suse.com> --- Changes against v5: + updated some comments (Steven) + do console_trylock() in console_trylock_spinning() (Steven) kernel/printk/printk.c | 245 +++++++++++++++++++++++++++++-------------------- 1 file changed, 148 insertions(+), 97 deletions(-) diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index 7e6459abba43..3057dbc69b4f 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -86,15 +86,8 @@ EXPORT_SYMBOL_GPL(console_drivers); static struct lockdep_map console_lock_dep_map = { .name = "console_lock" }; -static struct lockdep_map console_owner_dep_map = { - .name = "console_owner" -}; #endif -static DEFINE_RAW_SPINLOCK(console_owner_lock); -static struct task_struct *console_owner; -static bool console_waiter; - enum devkmsg_log_bits { __DEVKMSG_LOG_BIT_ON = 0, __DEVKMSG_LOG_BIT_OFF, @@ -1551,6 +1544,146 @@ SYSCALL_DEFINE3(syslog, int, type, char __user *, buf, int, len) } /* + * Special console_lock variants that help to reduce the risk of soft-lockups. + * They allow to pass console_lock to another printk() call using a busy wait. + */ + +#ifdef CONFIG_LOCKDEP +static struct lockdep_map console_owner_dep_map = { + .name = "console_owner" +}; +#endif + +static DEFINE_RAW_SPINLOCK(console_owner_lock); +static struct task_struct *console_owner; +static bool console_waiter; + +/** + * console_lock_spinning_enable - mark beginning of code where another + * thread might safely busy wait + * + * This basically converts console_lock into a spinlock. This marks + * the section where the console_lock owner can not sleep, because + * there may be a waiter spinning (like a spinlock). Also it must be + * ready to hand over the lock at the end of the section. + */ +static void console_lock_spinning_enable(void) +{ + raw_spin_lock(&console_owner_lock); + console_owner = current; + raw_spin_unlock(&console_owner_lock); + + /* The waiter may spin on us after setting console_owner */ + spin_acquire(&console_owner_dep_map, 0, 0, _THIS_IP_); +} + +/** + * console_lock_spinning_disable_and_check - mark end of code where another + * thread was able to busy wait and check if there is a waiter + * + * This is called at the end of the section where spinning is allowed. + * It has two functions. First, it is a signal that it is not longer + * safe to start busy waiting for the lock. Second, it checks if + * there is a busy waiter and passes the lock rights to her. + * + * Important: Callers lose the lock if there was the busy waiter. + * They must not touch items synchronized by console_lock + * in this case. + * + * Return: 1 if the lock rights were passed, 0 otherwise. + */ +static int console_lock_spinning_disable_and_check(void) +{ + int waiter; + + raw_spin_lock(&console_owner_lock); + waiter = READ_ONCE(console_waiter); + console_owner = NULL; + raw_spin_unlock(&console_owner_lock); + + if (!waiter) { + spin_release(&console_owner_dep_map, 1, _THIS_IP_); + return 0; + } + + /* The waiter is now free to continue */ + WRITE_ONCE(console_waiter, false); + + spin_release(&console_owner_dep_map, 1, _THIS_IP_); + + /* + * Hand off console_lock to waiter. The waiter will perform + * the up(). After this, the waiter is the console_lock owner. + */ + mutex_release(&console_lock_dep_map, 1, _THIS_IP_); + return 1; +} + +/** + * console_trylock_spinning - try to get console_lock by busy waiting + * + * This allows to busy wait for the console_lock when the current + * owner is running in a special marked sections. It means that + * the current owner is running and cannot reschedule until it + * is ready to loose the lock. + * + * Return: 1 if we got the lock, 0 othrewise + */ +static int console_trylock_spinning(void) +{ + struct task_struct *owner = NULL; + bool waiter; + bool spin = false; + unsigned long flags; + + if (console_trylock()) + return 1; + + printk_safe_enter_irqsave(flags); + + raw_spin_lock(&console_owner_lock); + owner = READ_ONCE(console_owner); + waiter = READ_ONCE(console_waiter); + if (!waiter && owner && owner != current) { + WRITE_ONCE(console_waiter, true); + spin = true; + } + raw_spin_unlock(&console_owner_lock); + + /* + * If there is an active printk() writing to the + * consoles, instead of having it write our data too, + * see if we can offload that load from the active + * printer, and do some printing ourselves. + * Go into a spin only if there isn't already a waiter + * spinning, and there is an active printer, and + * that active printer isn't us (recursive printk?). + */ + if (!spin) { + printk_safe_exit_irqrestore(flags); + return 0; + } + + /* We spin waiting for the owner to release us */ + spin_acquire(&console_owner_dep_map, 0, 0, _THIS_IP_); + /* Owner will clear console_waiter on hand off */ + while (READ_ONCE(console_waiter)) + cpu_relax(); + spin_release(&console_owner_dep_map, 1, _THIS_IP_); + + printk_safe_exit_irqrestore(flags); + /* + * The owner passed the console lock to us. + * Since we did not spin on console lock, annotate + * this as a trylock. Otherwise lockdep will + * complain. + */ + mutex_acquire(&console_lock_dep_map, 0, 1, _THIS_IP_); + + return 1; +} + +/* * Call the console drivers, asking them to write out * log_buf[start] to log_buf[end - 1]. * The console_lock must be held. @@ -1760,56 +1893,8 @@ asmlinkage int vprintk_emit(int facility, int level, * semaphore. The release will print out buffers and wake up * /dev/kmsg and syslog() users. */ - if (console_trylock()) { + if (console_trylock_spinning()) console_unlock(); - } else { - struct task_struct *owner = NULL; - bool waiter; - bool spin = false; - - printk_safe_enter_irqsave(flags); - - raw_spin_lock(&console_owner_lock); - owner = READ_ONCE(console_owner); - waiter = READ_ONCE(console_waiter); - if (!waiter && owner && owner != current) { - WRITE_ONCE(console_waiter, true); - spin = true; - } - raw_spin_unlock(&console_owner_lock); - - /* - * If there is an active printk() writing to the - * consoles, instead of having it write our data too, - * see if we can offload that load from the active - * printer, and do some printing ourselves. - * Go into a spin only if there isn't already a waiter - * spinning, and there is an active printer, and - * that active printer isn't us (recursive printk?). - */ - if (spin) { - /* We spin waiting for the owner to release us */ - spin_acquire(&console_owner_dep_map, 0, 0, _THIS_IP_); - /* Owner will clear console_waiter on hand off */ - while (READ_ONCE(console_waiter)) - cpu_relax(); - - spin_release(&console_owner_dep_map, 1, _THIS_IP_); - printk_safe_exit_irqrestore(flags); - - /* - * The owner passed the console lock to us. - * Since we did not spin on console lock, annotate - * this as a trylock. Otherwise lockdep will - * complain. - */ - mutex_acquire(&console_lock_dep_map, 0, 1, _THIS_IP_); - console_unlock(); - printk_safe_enter_irqsave(flags); - } - printk_safe_exit_irqrestore(flags); - - } } return printed_len; @@ -1910,6 +1995,8 @@ static ssize_t msg_print_ext_header(char *buf, size_t size, static ssize_t msg_print_ext_body(char *buf, size_t size, char *dict, size_t dict_len, char *text, size_t text_len) { return 0; } +static void console_lock_spinning_enable(void) { } +static int console_lock_spinning_disable_and_check(void) { return 0; } static void call_console_drivers(const char *ext_text, size_t ext_len, const char *text, size_t len) {} static size_t msg_print_text(const struct printk_log *msg, @@ -2196,7 +2283,6 @@ void console_unlock(void) static u64 seen_seq; unsigned long flags; bool wake_klogd = false; - bool waiter = false; bool do_cond_resched, retry; if (console_suspended) { @@ -2291,31 +2377,16 @@ void console_unlock(void) * finish. This task can not be preempted if there is a * waiter waiting to take over. */ - raw_spin_lock(&console_owner_lock); - console_owner = current; - raw_spin_unlock(&console_owner_lock); - - /* The waiter may spin on us after setting console_owner */ - spin_acquire(&console_owner_dep_map, 0, 0, _THIS_IP_); + console_lock_spinning_enable(); stop_critical_timings(); /* don't trace print latency */ call_console_drivers(ext_text, ext_len, text, len); start_critical_timings(); - raw_spin_lock(&console_owner_lock); - waiter = READ_ONCE(console_waiter); - console_owner = NULL; - raw_spin_unlock(&console_owner_lock); - - /* - * If there is a waiter waiting for us, then pass the - * rest of the work load over to that waiter. - */ - if (waiter) - break; - - /* There was no waiter, and nothing will spin on us here */ - spin_release(&console_owner_dep_map, 1, _THIS_IP_); + if (console_lock_spinning_disable_and_check()) { + printk_safe_exit_irqrestore(flags); + return; + } printk_safe_exit_irqrestore(flags); @@ -2323,26 +2394,6 @@ void console_unlock(void) cond_resched(); } - /* - * If there is an active waiter waiting on the console_lock. - * Pass off the printing to the waiter, and the waiter - * will continue printing on its CPU, and when all writing - * has finished, the last printer will wake up klogd. - */ - if (waiter) { - WRITE_ONCE(console_waiter, false); - /* The waiter is now free to continue */ - spin_release(&console_owner_dep_map, 1, _THIS_IP_); - /* - * Hand off console_lock to waiter. The waiter will perform - * the up(). After this, the waiter is the console_lock owner. - */ - mutex_release(&console_lock_dep_map, 1, _THIS_IP_); - printk_safe_exit_irqrestore(flags); - /* Note, if waiter is set, logbuf_lock is not held */ - return; - } - console_locked = 0; /* Release the exclusive_console once it is used */ -- 2.13.6 ^ permalink raw reply related [flat|nested] 140+ messages in thread
* Re: [PATCH v5 2/2] printk: Hide console waiter logic into helpers 2018-01-12 16:08 ` Petr Mladek @ 2018-01-12 16:36 ` Steven Rostedt 2018-01-15 16:08 ` Petr Mladek 0 siblings, 1 reply; 140+ messages in thread From: Steven Rostedt @ 2018-01-12 16:36 UTC (permalink / raw) To: Petr Mladek Cc: Sergey Senozhatsky, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Sergey Senozhatsky, Tejun Heo, Pavel Machek, linux-kernel On Fri, 12 Jan 2018 17:08:37 +0100 Petr Mladek <pmladek@suse.com> wrote: > On Fri 2018-01-12 10:37:54, Steven Rostedt wrote: > > On Thu, 11 Jan 2018 13:03:41 +0100 > > Petr Mladek <pmladek@suse.com> wrote: > > > All the other changes look good to me. I will use them in the next version. > > > > Great. > > Please, find below the updated version. If I get Ack at least from > Steven and no nack's, I will put it into linux-next next week. > Typos below. > > >From f67f70d910d9cf310a7bc73e97bf14097d31b059 Mon Sep 17 00:00:00 2001 > From: Petr Mladek <pmladek@suse.com> > Date: Fri, 22 Dec 2017 18:58:46 +0100 > Subject: [PATCH v6 2/4] printk: Hide console waiter logic into helpers > > The commit ("printk: Add console owner and waiter logic to load balance > console writes") made vprintk_emit() and console_unlock() even more > complicated. > > This patch extracts the new code into 3 helper functions. They should > help to keep it rather self-contained. It will be easier to use and > maintain. > > This patch just shuffles the existing code. It does not change > the functionality. > > Signed-off-by: Petr Mladek <pmladek@suse.com> > --- > Changes against v5: > > + updated some comments (Steven) > + do console_trylock() in console_trylock_spinning() (Steven) > > kernel/printk/printk.c | 245 +++++++++++++++++++++++++++++-------------------- > 1 file changed, 148 insertions(+), 97 deletions(-) > > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c > index 7e6459abba43..3057dbc69b4f 100644 > --- a/kernel/printk/printk.c > +++ b/kernel/printk/printk.c > @@ -86,15 +86,8 @@ EXPORT_SYMBOL_GPL(console_drivers); > static struct lockdep_map console_lock_dep_map = { > .name = "console_lock" > }; > -static struct lockdep_map console_owner_dep_map = { > - .name = "console_owner" > -}; > #endif > > -static DEFINE_RAW_SPINLOCK(console_owner_lock); > -static struct task_struct *console_owner; > -static bool console_waiter; > - > enum devkmsg_log_bits { > __DEVKMSG_LOG_BIT_ON = 0, > __DEVKMSG_LOG_BIT_OFF, > @@ -1551,6 +1544,146 @@ SYSCALL_DEFINE3(syslog, int, type, char __user *, buf, int, len) > } > > /* > + * Special console_lock variants that help to reduce the risk of soft-lockups. > + * They allow to pass console_lock to another printk() call using a busy wait. > + */ > + > +#ifdef CONFIG_LOCKDEP > +static struct lockdep_map console_owner_dep_map = { > + .name = "console_owner" > +}; > +#endif > + > +static DEFINE_RAW_SPINLOCK(console_owner_lock); > +static struct task_struct *console_owner; > +static bool console_waiter; > + > +/** > + * console_lock_spinning_enable - mark beginning of code where another > + * thread might safely busy wait > + * > + * This basically converts console_lock into a spinlock. This marks > + * the section where the console_lock owner can not sleep, because > + * there may be a waiter spinning (like a spinlock). Also it must be > + * ready to hand over the lock at the end of the section. > + */ > +static void console_lock_spinning_enable(void) > +{ > + raw_spin_lock(&console_owner_lock); > + console_owner = current; > + raw_spin_unlock(&console_owner_lock); > + > + /* The waiter may spin on us after setting console_owner */ > + spin_acquire(&console_owner_dep_map, 0, 0, _THIS_IP_); > +} > + > +/** > + * console_lock_spinning_disable_and_check - mark end of code where another > + * thread was able to busy wait and check if there is a waiter > + * > + * This is called at the end of the section where spinning is allowed. > + * It has two functions. First, it is a signal that it is not longer "it is no longer safe" > + * safe to start busy waiting for the lock. Second, it checks if > + * there is a busy waiter and passes the lock rights to her. > + * > + * Important: Callers lose the lock if there was the busy waiter. "if there was a busy waiter" > + * They must not touch items synchronized by console_lock > + * in this case. > + * > + * Return: 1 if the lock rights were passed, 0 otherwise. > + */ > +static int console_lock_spinning_disable_and_check(void) > +{ > + int waiter; > + > + raw_spin_lock(&console_owner_lock); > + waiter = READ_ONCE(console_waiter); > + console_owner = NULL; > + raw_spin_unlock(&console_owner_lock); > + > + if (!waiter) { > + spin_release(&console_owner_dep_map, 1, _THIS_IP_); > + return 0; > + } > + > + /* The waiter is now free to continue */ > + WRITE_ONCE(console_waiter, false); > + > + spin_release(&console_owner_dep_map, 1, _THIS_IP_); > + > + /* > + * Hand off console_lock to waiter. The waiter will perform > + * the up(). After this, the waiter is the console_lock owner. > + */ > + mutex_release(&console_lock_dep_map, 1, _THIS_IP_); > + return 1; > +} > + > +/** > + * console_trylock_spinning - try to get console_lock by busy waiting > + * > + * This allows to busy wait for the console_lock when the current > + * owner is running in a special marked sections. It means that "running in specially marked sections." > + * the current owner is running and cannot reschedule until it > + * is ready to loose the lock. "ready to lose the lock." > + * > + * Return: 1 if we got the lock, 0 othrewise > + */ > +static int console_trylock_spinning(void) > +{ > + struct task_struct *owner = NULL; > + bool waiter; > + bool spin = false; > + unsigned long flags; > + > + if (console_trylock()) > + return 1; > + > + printk_safe_enter_irqsave(flags); > + > + raw_spin_lock(&console_owner_lock); > + owner = READ_ONCE(console_owner); > + waiter = READ_ONCE(console_waiter); > + if (!waiter && owner && owner != current) { > + WRITE_ONCE(console_waiter, true); > + spin = true; > + } > + raw_spin_unlock(&console_owner_lock); > + > + /* > + * If there is an active printk() writing to the > + * consoles, instead of having it write our data too, > + * see if we can offload that load from the active > + * printer, and do some printing ourselves. > + * Go into a spin only if there isn't already a waiter > + * spinning, and there is an active printer, and > + * that active printer isn't us (recursive printk?). > + */ > + if (!spin) { > + printk_safe_exit_irqrestore(flags); > + return 0; > + } > + > + /* We spin waiting for the owner to release us */ > + spin_acquire(&console_owner_dep_map, 0, 0, _THIS_IP_); > + /* Owner will clear console_waiter on hand off */ > + while (READ_ONCE(console_waiter)) > + cpu_relax(); > + spin_release(&console_owner_dep_map, 1, _THIS_IP_); > + > + printk_safe_exit_irqrestore(flags); > + /* > + * The owner passed the console lock to us. > + * Since we did not spin on console lock, annotate > + * this as a trylock. Otherwise lockdep will > + * complain. > + */ > + mutex_acquire(&console_lock_dep_map, 0, 1, _THIS_IP_); > + > + return 1; > +} > + > +/* > * Call the console drivers, asking them to write out > * log_buf[start] to log_buf[end - 1]. > * The console_lock must be held. > @@ -1760,56 +1893,8 @@ asmlinkage int vprintk_emit(int facility, int level, > * semaphore. The release will print out buffers and wake up > * /dev/kmsg and syslog() users. > */ > - if (console_trylock()) { > + if (console_trylock_spinning()) > console_unlock(); > - } else { > - struct task_struct *owner = NULL; > - bool waiter; > - bool spin = false; > - > - printk_safe_enter_irqsave(flags); > - > - raw_spin_lock(&console_owner_lock); > - owner = READ_ONCE(console_owner); > - waiter = READ_ONCE(console_waiter); > - if (!waiter && owner && owner != current) { > - WRITE_ONCE(console_waiter, true); > - spin = true; > - } > - raw_spin_unlock(&console_owner_lock); > - > - /* > - * If there is an active printk() writing to the > - * consoles, instead of having it write our data too, > - * see if we can offload that load from the active > - * printer, and do some printing ourselves. > - * Go into a spin only if there isn't already a waiter > - * spinning, and there is an active printer, and > - * that active printer isn't us (recursive printk?). > - */ > - if (spin) { > - /* We spin waiting for the owner to release us */ > - spin_acquire(&console_owner_dep_map, 0, 0, _THIS_IP_); > - /* Owner will clear console_waiter on hand off */ > - while (READ_ONCE(console_waiter)) > - cpu_relax(); > - > - spin_release(&console_owner_dep_map, 1, _THIS_IP_); > - printk_safe_exit_irqrestore(flags); > - > - /* > - * The owner passed the console lock to us. > - * Since we did not spin on console lock, annotate > - * this as a trylock. Otherwise lockdep will > - * complain. > - */ > - mutex_acquire(&console_lock_dep_map, 0, 1, _THIS_IP_); > - console_unlock(); > - printk_safe_enter_irqsave(flags); > - } > - printk_safe_exit_irqrestore(flags); > - > - } > } > > return printed_len; > @@ -1910,6 +1995,8 @@ static ssize_t msg_print_ext_header(char *buf, size_t size, > static ssize_t msg_print_ext_body(char *buf, size_t size, > char *dict, size_t dict_len, > char *text, size_t text_len) { return 0; } > +static void console_lock_spinning_enable(void) { } > +static int console_lock_spinning_disable_and_check(void) { return 0; } > static void call_console_drivers(const char *ext_text, size_t ext_len, > const char *text, size_t len) {} > static size_t msg_print_text(const struct printk_log *msg, > @@ -2196,7 +2283,6 @@ void console_unlock(void) > static u64 seen_seq; > unsigned long flags; > bool wake_klogd = false; > - bool waiter = false; > bool do_cond_resched, retry; > > if (console_suspended) { > @@ -2291,31 +2377,16 @@ void console_unlock(void) > * finish. This task can not be preempted if there is a > * waiter waiting to take over. > */ > - raw_spin_lock(&console_owner_lock); > - console_owner = current; > - raw_spin_unlock(&console_owner_lock); > - > - /* The waiter may spin on us after setting console_owner */ > - spin_acquire(&console_owner_dep_map, 0, 0, _THIS_IP_); > + console_lock_spinning_enable(); > > stop_critical_timings(); /* don't trace print latency */ > call_console_drivers(ext_text, ext_len, text, len); > start_critical_timings(); > > - raw_spin_lock(&console_owner_lock); > - waiter = READ_ONCE(console_waiter); > - console_owner = NULL; > - raw_spin_unlock(&console_owner_lock); > - > - /* > - * If there is a waiter waiting for us, then pass the > - * rest of the work load over to that waiter. > - */ > - if (waiter) > - break; > - > - /* There was no waiter, and nothing will spin on us here */ > - spin_release(&console_owner_dep_map, 1, _THIS_IP_); > + if (console_lock_spinning_disable_and_check()) { > + printk_safe_exit_irqrestore(flags); > + return; > + } > > printk_safe_exit_irqrestore(flags); > > @@ -2323,26 +2394,6 @@ void console_unlock(void) > cond_resched(); > } > > - /* > - * If there is an active waiter waiting on the console_lock. > - * Pass off the printing to the waiter, and the waiter > - * will continue printing on its CPU, and when all writing > - * has finished, the last printer will wake up klogd. > - */ > - if (waiter) { > - WRITE_ONCE(console_waiter, false); > - /* The waiter is now free to continue */ > - spin_release(&console_owner_dep_map, 1, _THIS_IP_); > - /* > - * Hand off console_lock to waiter. The waiter will perform > - * the up(). After this, the waiter is the console_lock owner. > - */ > - mutex_release(&console_lock_dep_map, 1, _THIS_IP_); > - printk_safe_exit_irqrestore(flags); > - /* Note, if waiter is set, logbuf_lock is not held */ > - return; > - } > - > console_locked = 0; > > /* Release the exclusive_console once it is used */ Besides the typos (which should be fixed)... Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> -- Steve ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 2/2] printk: Hide console waiter logic into helpers 2018-01-12 16:36 ` Steven Rostedt @ 2018-01-15 16:08 ` Petr Mladek 2018-01-16 5:05 ` Sergey Senozhatsky 0 siblings, 1 reply; 140+ messages in thread From: Petr Mladek @ 2018-01-15 16:08 UTC (permalink / raw) To: Steven Rostedt Cc: Sergey Senozhatsky, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Sergey Senozhatsky, Tejun Heo, Pavel Machek, linux-kernel On Fri 2018-01-12 11:36:27, Steven Rostedt wrote: > On Fri, 12 Jan 2018 17:08:37 +0100 > Petr Mladek <pmladek@suse.com> wrote: > > >From f67f70d910d9cf310a7bc73e97bf14097d31b059 Mon Sep 17 00:00:00 2001 > > From: Petr Mladek <pmladek@suse.com> > > Date: Fri, 22 Dec 2017 18:58:46 +0100 > > Subject: [PATCH v6 2/4] printk: Hide console waiter logic into helpers > > > > The commit ("printk: Add console owner and waiter logic to load balance > > console writes") made vprintk_emit() and console_unlock() even more > > complicated. > > > > This patch extracts the new code into 3 helper functions. They should > > help to keep it rather self-contained. It will be easier to use and > > maintain. > > > > This patch just shuffles the existing code. It does not change > > the functionality. > > > Besides the typos (which should be fixed)... > > Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> JFYI, I have fixed the typos, updated the commit message for the 1st patch and pushed all into printk.git, branch for-4.16-console-waiter-logic, see https://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk.git/log/?h=for-4.16-console-waiter-logic I know that the discussion is not completely finished but it is somehow cycling. Sergey few times wrote that he would not block these patches. It is high time, I put it into linux-next. I could always remove it if decided in the discussion. Best Regards, Petr ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 2/2] printk: Hide console waiter logic into helpers 2018-01-15 16:08 ` Petr Mladek @ 2018-01-16 5:05 ` Sergey Senozhatsky 0 siblings, 0 replies; 140+ messages in thread From: Sergey Senozhatsky @ 2018-01-16 5:05 UTC (permalink / raw) To: Petr Mladek Cc: Steven Rostedt, Sergey Senozhatsky, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Sergey Senozhatsky, Tejun Heo, Pavel Machek, linux-kernel On (01/15/18 17:08), Petr Mladek wrote: > > Besides the typos (which should be fixed)... > > > > Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> > > JFYI, I have fixed the typos, updated the commit message for > the 1st patch and pushed all into printk.git, > branch for-4.16-console-waiter-logic, see > https://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk.git/log/?h=for-4.16-console-waiter-logic > > I know that the discussion is not completely finished but it is > somehow cycling. Sergey few times wrote that he would not block > these patches. It is high time, I put it into linux-next. I could > always remove it if decided in the discussion. Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> at least we have preemption out of printk->user way (one of the things I tried to tell you), which looks more like a step forward to me personally. p.s. the printk is still pretty far from what I want it to be. vprintk_emit() still can cause disturbance and damage in pretty unrelated places. e.g. hung tasks on console_sem, and so on. I'm going to keep my out-of-tree patches alive, may be they will be merged upstream in some form or another may be not. -ss ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-10 13:24 [PATCH v5 0/2] printk: Console owner and waiter logic cleanup Petr Mladek 2018-01-10 13:24 ` [PATCH v5 1/2] printk: Add console owner and waiter logic to load balance console writes Petr Mladek 2018-01-10 13:24 ` [PATCH v5 2/2] printk: Hide console waiter logic into helpers Petr Mladek @ 2018-01-10 14:05 ` Tejun Heo 2018-01-10 16:29 ` Petr Mladek 2018-01-10 18:05 ` Steven Rostedt 2 siblings, 2 replies; 140+ messages in thread From: Tejun Heo @ 2018-01-10 14:05 UTC (permalink / raw) To: Petr Mladek Cc: Steven Rostedt, Sergey Senozhatsky, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Sergey Senozhatsky, Pavel Machek, linux-kernel On Wed, Jan 10, 2018 at 02:24:16PM +0100, Petr Mladek wrote: > This is the last version of Steven's console owner/waiter logic. > Plus my proposal to hide it into 3 helper functions. It is supposed > to keep the code maintenable. > > The handshake really works. It happens about 10-times even during > boot of a simple system in qemu with a fast console here. It is > definitely able to avoid some softlockups. Let's see if it is > enough in practice. > > From my point of view, it is ready to go into linux-next so that > it can get some more test coverage. > > Steven's patch is the v4, see > https://lkml.kernel.org/r/20171108102723.602216b1@gandalf.local.home At least for now, Nacked-by: Tejun Heo <tj@kernel.org> Maybe this can be a part of solution but it's really worrying how the whole discussion around this subject is proceeding. You guys are trying to railroad actual problems. Please address actual technical problems. Thanks. -- tejun ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-10 14:05 ` [PATCH v5 0/2] printk: Console owner and waiter logic cleanup Tejun Heo @ 2018-01-10 16:29 ` Petr Mladek 2018-01-10 17:02 ` Tejun Heo ` (2 more replies) 2018-01-10 18:05 ` Steven Rostedt 1 sibling, 3 replies; 140+ messages in thread From: Petr Mladek @ 2018-01-10 16:29 UTC (permalink / raw) To: Tejun Heo Cc: Steven Rostedt, Sergey Senozhatsky, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Sergey Senozhatsky, Pavel Machek, linux-kernel On Wed 2018-01-10 06:05:47, Tejun Heo wrote: > On Wed, Jan 10, 2018 at 02:24:16PM +0100, Petr Mladek wrote: > > This is the last version of Steven's console owner/waiter logic. > > Plus my proposal to hide it into 3 helper functions. It is supposed > > to keep the code maintenable. > > > > The handshake really works. It happens about 10-times even during > > boot of a simple system in qemu with a fast console here. It is > > definitely able to avoid some softlockups. Let's see if it is > > enough in practice. > > > > From my point of view, it is ready to go into linux-next so that > > it can get some more test coverage. > > > > Steven's patch is the v4, see > > https://lkml.kernel.org/r/20171108102723.602216b1@gandalf.local.home > > At least for now, > > Nacked-by: Tejun Heo <tj@kernel.org> > > Maybe this can be a part of solution but it's really worrying how the > whole discussion around this subject is proceeding. You guys are > trying to railroad actual problems. Please address actual technical > problems. I wonder how long you follow the discussions about solving this problem. I was able to find one old solution from Jan Kara that was sent on January 15, 2013. You might google it by "[PATCH] printk: Avoid softlockups in console_unlock()". For example, it is archived at http://linux-kernel.2935.n7.nabble.com/PATCH-printk-Avoid-softlockups-in-console-unlock-td581957.html The historic Jan Kara's solution is actually very similar to your proposal at https://lkml.kernel.org/r/20171102135258.GO3252168@devbig577.frc2.facebook.com Why Jan Kara's Solution was not accepted? Was it because he was not trying enough? No, Jan provided several variants (based on workqueues, irqwork, kthread), for example https://lkml.kernel.org/r/1395770101-24534-1-git-send-email-jack@suse.cz Also he discussed this on conferences, etc. Later Jan handed over the fight to Sergey Senozhatsky, see https://lkml.kernel.org/r/1457175338-1665-1-git-send-email-sergey.senozhatsky@gmail.com Also Sergey was very active. He was addressing many issues, discussed this on Kernel Summit twice. Why is it not upstream? All attempts up to v12 were blocked by someone (Andrew, Linus, Pavel Machek, few others) because they did not guarantee enough that the kthread would wake up and they would be able to see the messages! Sergey tried to address this by forcing synchronous mode in some situations (panic, suspend, kexec, ...). But people still complained. One important milestone was v12, see https://lkml.kernel.org/r/20160513131848.2087-1-sergey.senozhatsky@gmail.com It was the last version where we did the offload immediately from vprintk_emit(). The next versions used lazy offload from console_unlock() when the thread spent there too much time. IMHO, this is one very promising solution. It guarantees that softlockup would never happen. But it tries hard to get the messages out immediately. Unfortunately, it is very complicated. We have troubles to understand the concerns, for example see the long discussion about v3 at https://lkml.kernel.org/r/20170509082859.854-1-sergey.senozhatsky@gmail.com I admit that I did not have enough time to review this. Anyway, in October, 2017, Steven came up with a completely different approach (console owner/waiter transfer). It does not guarantee that the softlockup will not happen. But it does not suffer from the problem that blocked the obvious solution for years. It moves the owner at runtime, so it is guaranteed that the new owner would continue printing. Finally, no solution is perfect! There are contradicting requirements on printk: get the messages out ASAP vs. do not block the system The harder you try to get the messages out the more you could block the entire system. Where is the acceptable compromise? I am not sure. So far, the most forceful people (Linus) did not see softlockups as a big problem. They rather wanted to see the messages. What could we do? + offload -> not acceptable so far + lazy offload -> might be acceptable if done more easily or gets review + try to transfer console owner (Steven) -> helps in several situations, so far only hand made stress code failed + reduce amount of messages + does it make sense to print the same warning 1000-times? + could one long warning cause softlockup with the console owner transfer? + throttle thread producing too many messages + IMHO, very good solution but nobody investigated it This patchset really helps in many situations. I believe that it does not make things worse. You might block it and spend another long time discussing other solutions. Will we need a better solution? Maybe, probably. Is it possible to provide an acceptable solution using offload? Probably using lazy offload. In a reasonable time frame with a comparably low risk? Me not. Best Regards, Petr ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-10 16:29 ` Petr Mladek @ 2018-01-10 17:02 ` Tejun Heo 2018-01-10 18:21 ` Peter Zijlstra ` (4 more replies) 2018-01-10 18:54 ` Steven Rostedt 2018-01-11 5:10 ` Sergey Senozhatsky 2 siblings, 5 replies; 140+ messages in thread From: Tejun Heo @ 2018-01-10 17:02 UTC (permalink / raw) To: Petr Mladek, Linus Torvalds, akpm Cc: Steven Rostedt, Sergey Senozhatsky, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Sergey Senozhatsky, Pavel Machek, linux-kernel Hello, Linus, Andrew. On Wed, Jan 10, 2018 at 05:29:00PM +0100, Petr Mladek wrote: > Where is the acceptable compromise? I am not sure. So far, the most > forceful people (Linus) did not see softlockups as a big problem. > They rather wanted to see the messages. Can you please chime in? Would you be opposed to offloading to an independent context even if it were only for cases where we were already punting? The thing with the current offloading is that we don't know who we're offloading to. It might end up in faster or slower context, or more importantly a dangerous one. The particular case that we've been seeing regularly in the fleet was the following scenario. 1. Console is IPMI emulated serial console. Super slow. Also netconsole is in use. 2. System runs out of memory, OOM triggers. 3. OOM handler is printing out OOM debug info. 4. While trying to emit the messages for netconsole, the network stack / driver tries to allocate memory and then fail, which in turn triggers allocation failure or other warning messages. printk was already flushing, so the messages are queued on the ring. 5. OOM handler keeps flushing but 4 repeats and the queue is never shrinking. Because OOM handler is trapped in printk flushing, it never manages to free memory and no one else can enter OOM path either, so the system is trapped in this state. The system usually never recovers in time once this sort of condition hits and the following was the patch that I suggested which only punts when messages are already being punted and we can easily make it less punty by delaying the punting by N messages. http://lkml.kernel.org/r/20171102135258.GO3252168@devbig577.frc2.facebook.com We definitely can fix the above described case by e.g. preventing printk flushing task from queueing more messages or whatever, but it just seems really dumb for the system to die from things like this in general and it doesn't really take all that much to trigger the condition. Thanks. -- tejun ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-10 17:02 ` Tejun Heo @ 2018-01-10 18:21 ` Peter Zijlstra 2018-01-10 18:30 ` Tejun Heo 2018-01-11 5:15 ` Sergey Senozhatsky 2018-01-10 18:22 ` Steven Rostedt ` (3 subsequent siblings) 4 siblings, 2 replies; 140+ messages in thread From: Peter Zijlstra @ 2018-01-10 18:21 UTC (permalink / raw) To: Tejun Heo Cc: Petr Mladek, Linus Torvalds, akpm, Steven Rostedt, Sergey Senozhatsky, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Sergey Senozhatsky, Pavel Machek, linux-kernel On Wed, Jan 10, 2018 at 09:02:23AM -0800, Tejun Heo wrote: > 2. System runs out of memory, OOM triggers. > 3. OOM handler is printing out OOM debug info. > 4. While trying to emit the messages for netconsole, the network stack > / driver tries to allocate memory and then fail, which in turn > triggers allocation failure or other warning messages. printk was > already flushing, so the messages are queued on the ring. > 5. OOM handler keeps flushing but 4 repeats and the queue is never > shrinking. Because OOM handler is trapped in printk flushing, it > never manages to free memory and no one else can enter OOM path > either, so the system is trapped in this state. Why not kill recursive OOM (msgs) ? ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-10 18:21 ` Peter Zijlstra @ 2018-01-10 18:30 ` Tejun Heo 2018-01-10 18:41 ` Peter Zijlstra 2018-01-11 5:15 ` Sergey Senozhatsky 1 sibling, 1 reply; 140+ messages in thread From: Tejun Heo @ 2018-01-10 18:30 UTC (permalink / raw) To: Peter Zijlstra Cc: Petr Mladek, Linus Torvalds, akpm, Steven Rostedt, Sergey Senozhatsky, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Sergey Senozhatsky, Pavel Machek, linux-kernel Hello, Peter. On Wed, Jan 10, 2018 at 07:21:53PM +0100, Peter Zijlstra wrote: > On Wed, Jan 10, 2018 at 09:02:23AM -0800, Tejun Heo wrote: > > 2. System runs out of memory, OOM triggers. > > 3. OOM handler is printing out OOM debug info. > > 4. While trying to emit the messages for netconsole, the network stack > > / driver tries to allocate memory and then fail, which in turn > > triggers allocation failure or other warning messages. printk was > > already flushing, so the messages are queued on the ring. > > 5. OOM handler keeps flushing but 4 repeats and the queue is never > > shrinking. Because OOM handler is trapped in printk flushing, it > > never manages to free memory and no one else can enter OOM path > > either, so the system is trapped in this state. > > Why not kill recursive OOM (msgs) ? Sure, we can do that too, e.g. marking flushing thread and ignoring new messages from it, although that does come with its own downsides. The choices are * If we can make printk safe without much downside, that'd be the best option. * If we decide that we can't do that in a reasonable way, we sure can try to plug the identified cases. We might have to play a bit of whack-a-mole (e.g. the feedback loop might not necessarily be from the same context) but there likely are very few repeatable cases. It could be me not knowing the history of the discussion but up until now the discussion hasn't really gotten to that point since I brought up the case that we've been seeing. Thanks. -- tejun ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-10 18:30 ` Tejun Heo @ 2018-01-10 18:41 ` Peter Zijlstra 2018-01-10 19:05 ` Tejun Heo 0 siblings, 1 reply; 140+ messages in thread From: Peter Zijlstra @ 2018-01-10 18:41 UTC (permalink / raw) To: Tejun Heo Cc: Petr Mladek, Linus Torvalds, akpm, Steven Rostedt, Sergey Senozhatsky, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Sergey Senozhatsky, Pavel Machek, linux-kernel On Wed, Jan 10, 2018 at 10:30:55AM -0800, Tejun Heo wrote: > > Why not kill recursive OOM (msgs) ? > > Sure, we can do that too, e.g. marking flushing thread and ignoring > new messages from it, although that does come with its own downsides. Typically we (scheduler) have removed printk()s (on boot) when BIGSMP folks say it creates boot pain. Much of it is now behind the sched_debug parameter, others are compressed. I've also seen other people reduce printk()s. In general reducing printk() is a good thing, its a low bandwidth channel for critical stuff like OOPSen and the like. ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-10 18:41 ` Peter Zijlstra @ 2018-01-10 19:05 ` Tejun Heo 0 siblings, 0 replies; 140+ messages in thread From: Tejun Heo @ 2018-01-10 19:05 UTC (permalink / raw) To: Peter Zijlstra Cc: Petr Mladek, Linus Torvalds, akpm, Steven Rostedt, Sergey Senozhatsky, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Sergey Senozhatsky, Pavel Machek, linux-kernel Hello, On Wed, Jan 10, 2018 at 07:41:44PM +0100, Peter Zijlstra wrote: > Typically we (scheduler) have removed printk()s (on boot) when BIGSMP > folks say it creates boot pain. Much of it is now behind the sched_debug > parameter, others are compressed. > > I've also seen other people reduce printk()s. > > In general reducing printk() is a good thing, its a low bandwidth > channel for critical stuff like OOPSen and the like. Yeah, sure, no disagreement there. It's just that this is a provision for when that breaks down. In the described scenario, it's also not caused by any particular one printing too many messages. OOM is just printing OOM info and packet tx is just printing standard alloc failed message (and some other following errors). It's the feedback loop which kills the machine. Thanks. -- tejun ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-10 18:21 ` Peter Zijlstra 2018-01-10 18:30 ` Tejun Heo @ 2018-01-11 5:15 ` Sergey Senozhatsky 1 sibling, 0 replies; 140+ messages in thread From: Sergey Senozhatsky @ 2018-01-11 5:15 UTC (permalink / raw) To: Tejun Heo, Peter Zijlstra Cc: Petr Mladek, Linus Torvalds, akpm, Steven Rostedt, Sergey Senozhatsky, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Sergey Senozhatsky, Pavel Machek, linux-kernel On (01/10/18 19:21), Peter Zijlstra wrote: > > On Wed, Jan 10, 2018 at 09:02:23AM -0800, Tejun Heo wrote: > > 2. System runs out of memory, OOM triggers. > > 3. OOM handler is printing out OOM debug info. > > 4. While trying to emit the messages for netconsole, the network stack > > / driver tries to allocate memory and then fail, which in turn > > triggers allocation failure or other warning messages. printk was > > already flushing, so the messages are queued on the ring. > > 5. OOM handler keeps flushing but 4 repeats and the queue is never > > shrinking. Because OOM handler is trapped in printk flushing, it > > never manages to free memory and no one else can enter OOM path > > either, so the system is trapped in this state. > > Why not kill recursive OOM (msgs) ? hm... do I understand it correctly that there is a console_unlock()->call_console_drivers()->FOO_write()->kmalloc()->printk() recursion? we call console drivers from printk-safe context now. so those printks from kmalloc are redirected to per-CPU printk-safe buffer, which is limited in size (we probably might start losing some of those OOM messages) and which is flushed (log_store()) from another context. -ss ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-10 17:02 ` Tejun Heo 2018-01-10 18:21 ` Peter Zijlstra @ 2018-01-10 18:22 ` Steven Rostedt 2018-01-10 18:36 ` Tejun Heo 2018-01-10 18:40 ` Mathieu Desnoyers ` (2 subsequent siblings) 4 siblings, 1 reply; 140+ messages in thread From: Steven Rostedt @ 2018-01-10 18:22 UTC (permalink / raw) To: Tejun Heo Cc: Petr Mladek, Linus Torvalds, akpm, Sergey Senozhatsky, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Sergey Senozhatsky, Pavel Machek, linux-kernel On Wed, 10 Jan 2018 09:02:23 -0800 Tejun Heo <tj@kernel.org> wrote: > Hello, Linus, Andrew. > > On Wed, Jan 10, 2018 at 05:29:00PM +0100, Petr Mladek wrote: > > Where is the acceptable compromise? I am not sure. So far, the most > > forceful people (Linus) did not see softlockups as a big problem. > > They rather wanted to see the messages. > > Can you please chime in? Would you be opposed to offloading to an > independent context even if it were only for cases where we were > already punting? The thing with the current offloading is that we > don't know who we're offloading to. It might end up in faster or > slower context, or more importantly a dangerous one. And how is that different to what we have today? It could be the "dangerous one" that did the first printk, and 100 other CPUs in "non dangerous" locations are constantly calling printk and making that "dangerous" one NEVER STOP. My solution is, if there are a ton of printks going off, each one will do a single print, and pass it to the next one. The printk will only be stuck doing more than one message if no more printks happen. Which is a good thing! Again, my algorithm bounds printk to printing AT MOST the printk buffer size. And that can only happen if there was a burst of printks on all CPUs, and then no printks. The one to get handed off the printk would just finish the buffer and continue. Which should not be an issue. > > The particular case that we've been seeing regularly in the fleet was > the following scenario. > > 1. Console is IPMI emulated serial console. Super slow. Also > netconsole is in use. > 2. System runs out of memory, OOM triggers. > 3. OOM handler is printing out OOM debug info. > 4. While trying to emit the messages for netconsole, the network stack > / driver tries to allocate memory and then fail, which in turn > triggers allocation failure or other warning messages. printk was > already flushing, so the messages are queued on the ring. This looks like a bug in the netconsole, as the net console shouldn't print warnings if the warning is caused by it doing a print. Totally unrelated problem to my and Petr's patch set. Basically your argument is "I see this bug, and your patch doesn't fix it". Well maybe we are not solving your bug. Not to mention, it looks like printk isn't the bug, but net console is. > 5. OOM handler keeps flushing but 4 repeats and the queue is never > shrinking. Because OOM handler is trapped in printk flushing, it > never manages to free memory and no one else can enter OOM path > either, so the system is trapped in this state. > > The system usually never recovers in time once this sort of condition > hits and the following was the patch that I suggested which only punts > when messages are already being punted and we can easily make it less > punty by delaying the punting by N messages. > > http://lkml.kernel.org/r/20171102135258.GO3252168@devbig577.frc2.facebook.com > > We definitely can fix the above described case by e.g. preventing > printk flushing task from queueing more messages or whatever, but it > just seems really dumb for the system to die from things like this in > general and it doesn't really take all that much to trigger the > condition. It seems really dumb to not fix that recursive net console bug, and try to solve it with a printk work around. -- Steve ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-10 18:22 ` Steven Rostedt @ 2018-01-10 18:36 ` Tejun Heo 0 siblings, 0 replies; 140+ messages in thread From: Tejun Heo @ 2018-01-10 18:36 UTC (permalink / raw) To: Steven Rostedt Cc: Petr Mladek, Linus Torvalds, akpm, Sergey Senozhatsky, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Sergey Senozhatsky, Pavel Machek, linux-kernel Hello, On Wed, Jan 10, 2018 at 01:22:55PM -0500, Steven Rostedt wrote: > > Can you please chime in? Would you be opposed to offloading to an > > independent context even if it were only for cases where we were > > already punting? The thing with the current offloading is that we > > don't know who we're offloading to. It might end up in faster or > > slower context, or more importantly a dangerous one. > > And how is that different to what we have today? It could be the > "dangerous one" that did the first printk, and 100 other CPUs in "non > dangerous" locations are constantly calling printk and making that > "dangerous" one NEVER STOP. So, the dangerous one would punt to the dedicated safe one beyond certain point. The posted version just flushes to the last message that it saw on entry to flush. > > The particular case that we've been seeing regularly in the fleet was > > the following scenario. > > > > 1. Console is IPMI emulated serial console. Super slow. Also > > netconsole is in use. > > 2. System runs out of memory, OOM triggers. > > 3. OOM handler is printing out OOM debug info. > > 4. While trying to emit the messages for netconsole, the network stack > > / driver tries to allocate memory and then fail, which in turn > > triggers allocation failure or other warning messages. printk was > > already flushing, so the messages are queued on the ring. > > This looks like a bug in the netconsole, as the net console shouldn't > print warnings if the warning is caused by it doing a print. > > Totally unrelated problem to my and Petr's patch set. Basically your > argument is "I see this bug, and your patch doesn't fix it". Well maybe > we are not solving your bug. Not to mention, it looks like printk isn't > the bug, but net console is. Sure, that could be the case, especially if punting to a safe context can't be done reasonably (and there are downsides to silencing the recursive messages too), but it'd also be really great to have printk generaly safe from brining down a machine this way, right? I just don't yet see why punting to a safe context is so difficult / undesirable that we can't solve the issue in a general manner. Thanks. -- tejun ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-10 17:02 ` Tejun Heo 2018-01-10 18:21 ` Peter Zijlstra 2018-01-10 18:22 ` Steven Rostedt @ 2018-01-10 18:40 ` Mathieu Desnoyers 2018-01-11 7:36 ` Sergey Senozhatsky 2018-01-24 9:36 ` Peter Zijlstra 2018-05-09 8:58 ` Sergey Senozhatsky 4 siblings, 1 reply; 140+ messages in thread From: Mathieu Desnoyers @ 2018-01-10 18:40 UTC (permalink / raw) To: Tejun Heo Cc: Petr Mladek, Linus Torvalds, Andrew Morton, rostedt, Sergey Senozhatsky, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Jan Kara, Tetsuo Handa, rostedt, Byungchul Park, Sergey Senozhatsky, Pavel Machek, linux-kernel ----- On Jan 10, 2018, at 12:02 PM, Tejun Heo tj@kernel.org wrote: > Hello, Linus, Andrew. > > On Wed, Jan 10, 2018 at 05:29:00PM +0100, Petr Mladek wrote: >> Where is the acceptable compromise? I am not sure. So far, the most >> forceful people (Linus) did not see softlockups as a big problem. >> They rather wanted to see the messages. > > Can you please chime in? Would you be opposed to offloading to an > independent context even if it were only for cases where we were > already punting? The thing with the current offloading is that we > don't know who we're offloading to. It might end up in faster or > slower context, or more importantly a dangerous one. > > The particular case that we've been seeing regularly in the fleet was > the following scenario. > > 1. Console is IPMI emulated serial console. Super slow. Also > netconsole is in use. > 2. System runs out of memory, OOM triggers. > 3. OOM handler is printing out OOM debug info. > 4. While trying to emit the messages for netconsole, the network stack > / driver tries to allocate memory and then fail, which in turn > triggers allocation failure or other warning messages. printk was > already flushing, so the messages are queued on the ring. > 5. OOM handler keeps flushing but 4 repeats and the queue is never > shrinking. Because OOM handler is trapped in printk flushing, it > never manages to free memory and no one else can enter OOM path > either, so the system is trapped in this state. Hi Tejun, There appears to be two problems at hand. One is making sure a console buffer owner only flushes a bounded amount of data. Steven&Co patches seem to address this. The second problem you describe here appears to be related to the side-effects of console drivers, namely netconsole in this scenario. Its use of the network stack can allocate memory, which can fail, and therefore trigger more printk. Having a way to detect that code is directly called from a printk driver, and making sure error handling is _not_ done by pushing more printk messages to that printk driver in those situations comes to mind as a possible solution. The problem you describe seems to be _another_ issue of the current printk implementation which Steven's approach does not address, but I don't think that Steven's changes prevent doing further improvements on the netconsole driver front. I also don't see what's wrong in the incremental approach proposed by Steven. Even though it does not fix your console driver problem, his patchset appears to address some real-world latency issues. Thanks, Mathieu > > The system usually never recovers in time once this sort of condition > hits and the following was the patch that I suggested which only punts > when messages are already being punted and we can easily make it less > punty by delaying the punting by N messages. > > http://lkml.kernel.org/r/20171102135258.GO3252168@devbig577.frc2.facebook.com > > We definitely can fix the above described case by e.g. preventing > printk flushing task from queueing more messages or whatever, but it > just seems really dumb for the system to die from things like this in > general and it doesn't really take all that much to trigger the > condition. > > Thanks. > > -- > tejun -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-10 18:40 ` Mathieu Desnoyers @ 2018-01-11 7:36 ` Sergey Senozhatsky 2018-01-11 11:24 ` Petr Mladek 0 siblings, 1 reply; 140+ messages in thread From: Sergey Senozhatsky @ 2018-01-11 7:36 UTC (permalink / raw) To: Mathieu Desnoyers Cc: Tejun Heo, Petr Mladek, Linus Torvalds, Andrew Morton, rostedt, Sergey Senozhatsky, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Jan Kara, Tetsuo Handa, rostedt, Byungchul Park, Sergey Senozhatsky, Pavel Machek, linux-kernel Hi Mathieu, On (01/10/18 18:40), Mathieu Desnoyers wrote: [..] > > There appears to be two problems at hand. One is making sure a console > buffer owner only flushes a bounded amount of data. which, realistically, has quite little to do with the "and thus it fixes the lockups". logbuf size is mutable, the number of consoles we need to sequentially push the data to is mutable, the watchdog threshold is mutable... if combination of first two mutable things produces the result which makes the check based on the third mutable thing happy, then it's just an accident. my 5 cents. -ss ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-11 7:36 ` Sergey Senozhatsky @ 2018-01-11 11:24 ` Petr Mladek 2018-01-11 13:19 ` Sergey Senozhatsky 0 siblings, 1 reply; 140+ messages in thread From: Petr Mladek @ 2018-01-11 11:24 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Mathieu Desnoyers, Tejun Heo, Linus Torvalds, Andrew Morton, rostedt, Sergey Senozhatsky, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Jan Kara, Tetsuo Handa, rostedt, Byungchul Park, Pavel Machek, linux-kernel On Thu 2018-01-11 16:36:18, Sergey Senozhatsky wrote: > Hi Mathieu, > > On (01/10/18 18:40), Mathieu Desnoyers wrote: > [..] > > > > There appears to be two problems at hand. One is making sure a console > > buffer owner only flushes a bounded amount of data. > > which, realistically, has quite little to do with the "and thus it > fixes the lockups". logbuf size is mutable, the number of consoles we > need to sequentially push the data to is mutable, the watchdog threshold > is mutable... if combination of first two mutable things produces the > result which makes the check based on the third mutable thing happy, > then it's just an accident. my 5 cents. Yes, there might be situations when Steven's patch is not able to prevent the softlockup. But there is clear evidence that it will help in many other situations. The offload-based solution prevents the softlockup completely. But there might be situations where the offload does not happen and people might miss important messages. And this is my point. Steven's patch is not perfect. But it helps and it seems that it does not cause regressions. The offload based solution solves one problem a better way but it might cause regressions that are being discussed for years. IMHO, nobody know how much Steven's solution is effective until we push it into the wild. IMHO, it is safe to be pushed. You might argue that we already know that Steven's solution will not be enough. IMHO, the problem here is the term "real life example". My understanding is that real-life example is a softlockup report from a system running in production or used for debugging any bug. So far, Steven's opponents provided only hand made code or scenarios. The provided code usually produced printk() messages in a tight loop. In each case, there is not a consensus that they simulated a real life problem good enough. We might continue discussing it but basically any discussion is theoretical unless there are hard data behind it. I vote to push Steven's patch into the wild and see. I really would like to give it a chance. Best Regards, Petr ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-11 11:24 ` Petr Mladek @ 2018-01-11 13:19 ` Sergey Senozhatsky 0 siblings, 0 replies; 140+ messages in thread From: Sergey Senozhatsky @ 2018-01-11 13:19 UTC (permalink / raw) To: Petr Mladek Cc: Sergey Senozhatsky, Mathieu Desnoyers, Tejun Heo, Linus Torvalds, Andrew Morton, rostedt, Sergey Senozhatsky, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Jan Kara, Tetsuo Handa, rostedt, Byungchul Park, Pavel Machek, linux-kernel On (01/11/18 12:24), Petr Mladek wrote: [..] > You might argue that we already know that Steven's solution will > not be enough. IMHO, the problem here is the term "real life example". this is really boring, how real life examples happen only on Steven's PC or Petr's qemu image. whatever. -ss ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-10 17:02 ` Tejun Heo ` (2 preceding siblings ...) 2018-01-10 18:40 ` Mathieu Desnoyers @ 2018-01-24 9:36 ` Peter Zijlstra 2018-01-24 18:46 ` Tejun Heo 2018-05-09 8:58 ` Sergey Senozhatsky 4 siblings, 1 reply; 140+ messages in thread From: Peter Zijlstra @ 2018-01-24 9:36 UTC (permalink / raw) To: Tejun Heo Cc: Petr Mladek, Linus Torvalds, akpm, Steven Rostedt, Sergey Senozhatsky, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Sergey Senozhatsky, Pavel Machek, linux-kernel On Wed, Jan 10, 2018 at 09:02:23AM -0800, Tejun Heo wrote: > 1. Console is IPMI emulated serial console. Super slow. Also > netconsole is in use. So my IPMI SoE typically run at 115200 Baud (or higher) and I've not had trouble like that (granted I don't typically trigger OOM storms, but they do occasionally happen). Is your IPMI much slower and not fixable to be faster? ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-24 9:36 ` Peter Zijlstra @ 2018-01-24 18:46 ` Tejun Heo 0 siblings, 0 replies; 140+ messages in thread From: Tejun Heo @ 2018-01-24 18:46 UTC (permalink / raw) To: Peter Zijlstra Cc: Petr Mladek, Linus Torvalds, akpm, Steven Rostedt, Sergey Senozhatsky, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Sergey Senozhatsky, Pavel Machek, linux-kernel Hello, Peter. On Wed, Jan 24, 2018 at 10:36:07AM +0100, Peter Zijlstra wrote: > On Wed, Jan 10, 2018 at 09:02:23AM -0800, Tejun Heo wrote: > > 1. Console is IPMI emulated serial console. Super slow. Also > > netconsole is in use. > > So my IPMI SoE typically run at 115200 Baud (or higher) and I've not had > trouble like that (granted I don't typically trigger OOM storms, but > they do occasionally happen). > > Is your IPMI much slower and not fixable to be faster? It looks like the latest machines have the baud rate at 57600 and I'm pretty sure we have a lot of slower ones. 57600 isn't 9600 but is still slow enough to get into trouble often enough. There are a huge number of machines running all sorts of things under heavy load and trying to rapidly deploy new kernels / features contributes to encountering bugs and weird corner cases. UART can run a lot faster and I have no idea why IPMI consoles behave as if they were connected over mile-long DB9 cables. Maybe we can convince hardware people to improve it but, even if that happened today, we'd still be looking at years of dealing with slower ones, and IPMI situation here is likely better than what many others are facing. idk, it's not a particularly difficult problem to solve from kernel side. Just need to figure out a better / more robust trade-off. Thanks. -- tejun ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-10 17:02 ` Tejun Heo ` (3 preceding siblings ...) 2018-01-24 9:36 ` Peter Zijlstra @ 2018-05-09 8:58 ` Sergey Senozhatsky 4 siblings, 0 replies; 140+ messages in thread From: Sergey Senozhatsky @ 2018-05-09 8:58 UTC (permalink / raw) To: Tejun Heo, Petr Mladek, Andrew Morton, Steven Rostedt, Johannes Weiner, Michal Hocko, Vlastimil Babka Cc: Petr Mladek, Linus Torvalds, Sergey Senozhatsky, linux-mm, Cong Wang, Dave Hansen, Mel Gorman, Peter Zijlstra, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Sergey Senozhatsky, Pavel Machek, linux-kernel Hi, Move printk and (some of) MM people to the recipients list. On (01/10/18 09:02), Tejun Heo wrote: [..] > The particular case that we've been seeing regularly in the fleet was > the following scenario. > > 1. Console is IPMI emulated serial console. Super slow. Also > netconsole is in use. > 2. System runs out of memory, OOM triggers. > 3. OOM handler is printing out OOM debug info. > 4. While trying to emit the messages for netconsole, the network stack > / driver tries to allocate memory and then fail, which in turn > triggers allocation failure or other warning messages. printk was > already flushing, so the messages are queued on the ring. > 5. OOM handler keeps flushing but 4 repeats and the queue is never > shrinking. Because OOM handler is trapped in printk flushing, it > never manages to free memory and no one else can enter OOM path > either, so the system is trapped in this state. Tejun, we have a theory [since there are no logs available] that what you are looking at is something as follows: console_unlock() { for (;;) { call_console_drivers() kmalloc()/etc /* netconsole, skb kmalloc(), for instance */ __alloc_pages_slowpath() warn_alloc() /* a bunch of printk() -> log_store() */ } } Now, warn_alloc() is rate limited to DEFAULT_RATELIMIT_INTERVAL / DEFAULT_RATELIMIT_BURST so net console driver can add 10 warn_alloc() reports every 5 seconds to the logbuf. You have a "super slow" IPMI console and net console. So for every logbuf entry we do: console_unlock() { for (;;) { call_console_drivers(msg) -> IPMI_write() call_console_drivers(msg) -> netconsole_write() -> skb kmalloc() -> warn_alloc() -> ratelimit } } IPMI_write() is very slow, as you have noted, so it consumes time printing messages, simultaneously warn_alloc() rate limit depends on time. *Probably*, slow IPMI_write() is unable to flush 10 warn_alloc() reports under 5 seconds, which gives net console a chance to add another 10 warn_alloc()-s, while the previous 10 warn_alloc()-s have not been flushed yet. It seems that DEFAULT_RATELIMIT_INTERVAL / DEFAULT_RATELIMIT_BURST warn_alloc() rate limit is too permissive for your setup. Can you confirm that the theory is actually correct? If it is correct, then can we simply tweak warn_alloc() rate limit? Say, make it x2 / x4 / etc. times less verbose? E.g. "up to 5 warn_alloc()-s every 10 seconds"? What do MM folks think? -ss ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-10 16:29 ` Petr Mladek 2018-01-10 17:02 ` Tejun Heo @ 2018-01-10 18:54 ` Steven Rostedt 2018-01-11 5:10 ` Sergey Senozhatsky 2 siblings, 0 replies; 140+ messages in thread From: Steven Rostedt @ 2018-01-10 18:54 UTC (permalink / raw) To: Petr Mladek Cc: Tejun Heo, Sergey Senozhatsky, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Sergey Senozhatsky, Pavel Machek, linux-kernel On Wed, 10 Jan 2018 17:29:00 +0100 Petr Mladek <pmladek@suse.com> wrote: > he next versions used lazy offload from console_unlock() when > the thread spent there too much time. IMHO, this is one > very promising solution. It guarantees that softlockup > would never happen. But it tries hard to get the messages > out immediately. > > Unfortunately, it is very complicated. We have troubles to understand > the concerns, for example see the long discussion about v3 at > https://lkml.kernel.org/r/20170509082859.854-1-sergey.senozhatsky@gmail.com > I admit that I did not have enough time to review this. > > > Anyway, in October, 2017, Steven came up with a completely > different approach (console owner/waiter transfer). It does > not guarantee that the softlockup will not happen. But it > does not suffer from the problem that blocked the obvious > solution for years. It moves the owner at runtime, so > it is guaranteed that the new owner would continue > printing. Yes, I believe my solution and the offloading solution are two agnostic solutions, and they are not mutually exclusive. They both can be applied. But mine shouldn't be controversial as it has no down sides from the current printk solution. After adding this one, if issues come up, we should have a better idea of how to handle them, because I'm betting the issues will only come up in some pretty unique scenarios. And they may even be solved without having to touch printk (and hurt the get out ASAP requirement). I don't want to paper over some real issues of those that use printk, with printk work arounds. -- Steve ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-10 16:29 ` Petr Mladek 2018-01-10 17:02 ` Tejun Heo 2018-01-10 18:54 ` Steven Rostedt @ 2018-01-11 5:10 ` Sergey Senozhatsky 2 siblings, 0 replies; 140+ messages in thread From: Sergey Senozhatsky @ 2018-01-11 5:10 UTC (permalink / raw) To: Petr Mladek Cc: Tejun Heo, Steven Rostedt, Sergey Senozhatsky, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Sergey Senozhatsky, Pavel Machek, linux-kernel On (01/10/18 17:29), Petr Mladek wrote: [..] > The next versions used lazy offload from console_unlock() when > the thread spent there too much time. IMHO, this is one > very promising solution. It guarantees that softlockup > would never happen. But it tries hard to get the messages > out immediately. a small addition. my motivation was not exactly the "lazy offload", but to keep the existing printk behavior as long as possible. and that "as long as possible" is determined by watchdog threshold, which is the only limit we must care about. as long as printing task spends more than 1/2 of watchdog threshold - we offload. otherwise we don't mess up with the existing logic/guarantees/etc. there is also a bunch of other things in the patch now. but nothing fantastically complex. -ss ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-10 14:05 ` [PATCH v5 0/2] printk: Console owner and waiter logic cleanup Tejun Heo 2018-01-10 16:29 ` Petr Mladek @ 2018-01-10 18:05 ` Steven Rostedt 2018-01-10 18:12 ` Tejun Heo 2018-01-11 4:58 ` Sergey Senozhatsky 1 sibling, 2 replies; 140+ messages in thread From: Steven Rostedt @ 2018-01-10 18:05 UTC (permalink / raw) To: Tejun Heo Cc: Petr Mladek, Sergey Senozhatsky, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Sergey Senozhatsky, Pavel Machek, linux-kernel On Wed, 10 Jan 2018 06:05:47 -0800 Tejun Heo <tj@kernel.org> wrote: > On Wed, Jan 10, 2018 at 02:24:16PM +0100, Petr Mladek wrote: > > This is the last version of Steven's console owner/waiter logic. > > Plus my proposal to hide it into 3 helper functions. It is supposed > > to keep the code maintenable. > > > > The handshake really works. It happens about 10-times even during > > boot of a simple system in qemu with a fast console here. It is > > definitely able to avoid some softlockups. Let's see if it is > > enough in practice. > > > > From my point of view, it is ready to go into linux-next so that > > it can get some more test coverage. > > > > Steven's patch is the v4, see > > https://lkml.kernel.org/r/20171108102723.602216b1@gandalf.local.home > > At least for now, > > Nacked-by: Tejun Heo <tj@kernel.org> And I NACK your NACK! > > Maybe this can be a part of solution but it's really worrying how the > whole discussion around this subject is proceeding. You guys are > trying to railroad actual problems. Please address actual technical > problems. WE ARE! I presented the issue at Kernel Summit and everyone agreed with me that the issue my patch solves is a real issue. You have yet to demonstrate how this does not solve issues. I presented the history of printk, where it use to serialize all printks. This was a problem when you had n CPUs doing printks at the same time, because the n'th CPU had to wait for the n-1 CPUs to print before it could. This was obviously an issue. The "solution" to that was to have the first printk do the printing, and all other printks that come in while it is printing just load their data into the log buffer and continue. The first printk would get stuck printing for everyone else. This was fine when we had 4 CPUs, but now that we have boxes with 100s of CPUs, this is definitely an issue. I demonstrated that this caused printk() to be unbounded, and there were real word scenarios that could easily cause a printk to never stop printing. My solution is to make printk() have a max bounded time to print. This is how we solve things in the Real Time world, and it makes perfect sense in this context. The point being, the max a printk() could print, and that is if it was really unlucky, which would be really unlikely because it would mean we had a burst of printks followed by no printks, the bounded time is what it takes to print the entire buffer. My solution takes printk from its current unbounded state, and makes it fixed bounded. Which means printk() is now a O(1) algorithm. The solution is simple, everyone at KS agreed with it, there should be no controversy here. You on the other hand are showing unrealistic scenarios, and crying that it's what you see in production, with no proof of it. My printk solution is solid, with no risk of regressions of current printk usages. If anything, I'll pull theses patches myself, and push them to Linus directly. I'll Cc you and you can make your argument to NACK them, and I'll make mine to take them. -- Steve ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-10 18:05 ` Steven Rostedt @ 2018-01-10 18:12 ` Tejun Heo 2018-01-10 18:14 ` Tejun Heo 2018-01-10 18:41 ` Steven Rostedt 2018-01-11 4:58 ` Sergey Senozhatsky 1 sibling, 2 replies; 140+ messages in thread From: Tejun Heo @ 2018-01-10 18:12 UTC (permalink / raw) To: Steven Rostedt Cc: Petr Mladek, Sergey Senozhatsky, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Sergey Senozhatsky, Pavel Machek, linux-kernel Hello, Steven. So, everything else on your message, sure. You do what you have to do, but I really don't understand the following part, and this has been the main source of frustration in the whole discussion. On Wed, Jan 10, 2018 at 01:05:17PM -0500, Steven Rostedt wrote: > You on the other hand are showing unrealistic scenarios, and crying > that it's what you see in production, with no proof of it. I've explained the same scenario multiple times. Unless you're assuming that I'm lying, it should be amply clear that the scenario is unrealistic - we've been seeing them taking place repeatedly for quite a while. What I don't understand is why we can't address this seemingly obvious problem. If there are technical reasons and the consensus is to not solve this within flushing logic, sure, we can deal with it otherwise, but we at least have to be able to agree that there are actual issues here, no? Thanks. -- tejun ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-10 18:12 ` Tejun Heo @ 2018-01-10 18:14 ` Tejun Heo 2018-01-10 18:45 ` Steven Rostedt 2018-01-10 18:41 ` Steven Rostedt 1 sibling, 1 reply; 140+ messages in thread From: Tejun Heo @ 2018-01-10 18:14 UTC (permalink / raw) To: Steven Rostedt Cc: Petr Mladek, Sergey Senozhatsky, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Sergey Senozhatsky, Pavel Machek, linux-kernel On Wed, Jan 10, 2018 at 10:12:52AM -0800, Tejun Heo wrote: > Hello, Steven. > > So, everything else on your message, sure. You do what you have to > do, but I really don't understand the following part, and this has > been the main source of frustration in the whole discussion. > > On Wed, Jan 10, 2018 at 01:05:17PM -0500, Steven Rostedt wrote: > > You on the other hand are showing unrealistic scenarios, and crying > > that it's what you see in production, with no proof of it. > > I've explained the same scenario multiple times. Unless you're > assuming that I'm lying, it should be amply clear that the scenario is > unrealistic - we've been seeing them taking place repeatedly for quite > a while. Oops, I meant to write "not unrealistic". Anyways, if you think I'm lying, please let me know. I can ask others who have been seeing the issue to join the thread. Thanks. -- tejun ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-10 18:14 ` Tejun Heo @ 2018-01-10 18:45 ` Steven Rostedt 0 siblings, 0 replies; 140+ messages in thread From: Steven Rostedt @ 2018-01-10 18:45 UTC (permalink / raw) To: Tejun Heo Cc: Petr Mladek, Sergey Senozhatsky, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Sergey Senozhatsky, Pavel Machek, linux-kernel On Wed, 10 Jan 2018 10:14:59 -0800 Tejun Heo <tj@kernel.org> wrote: > On Wed, Jan 10, 2018 at 10:12:52AM -0800, Tejun Heo wrote: > > Hello, Steven. > > > > So, everything else on your message, sure. You do what you have to > > do, but I really don't understand the following part, and this has > > been the main source of frustration in the whole discussion. > > > > On Wed, Jan 10, 2018 at 01:05:17PM -0500, Steven Rostedt wrote: > > > You on the other hand are showing unrealistic scenarios, and crying > > > that it's what you see in production, with no proof of it. > > > > I've explained the same scenario multiple times. Unless you're > > assuming that I'm lying, it should be amply clear that the scenario is > > unrealistic - we've been seeing them taking place repeatedly for quite > > a while. > > Oops, I meant to write "not unrealistic". Anyways, if you think I'm > lying, please let me know. I can ask others who have been seeing the > issue to join the thread. I don't believe you are lying. I believe you are interpreting one problem as another. I don't see this is a printk bug, I see it as a recursive OOM + net console bug. My patch is not trying to solve that, and I don't believe it should be solved via printk. I'm trying to solve the problem of printk spamming all CPUs causing a single CPU to lock up. That is a real bug that has been hit in various different scenarios, where there is no other underlying bug. This issue is a printk problem, and my solution solves it for printk. -- Steve ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-10 18:12 ` Tejun Heo 2018-01-10 18:14 ` Tejun Heo @ 2018-01-10 18:41 ` Steven Rostedt 2018-01-10 18:57 ` Tejun Heo 1 sibling, 1 reply; 140+ messages in thread From: Steven Rostedt @ 2018-01-10 18:41 UTC (permalink / raw) To: Tejun Heo Cc: Petr Mladek, Sergey Senozhatsky, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Sergey Senozhatsky, Pavel Machek, linux-kernel On Wed, 10 Jan 2018 10:12:52 -0800 Tejun Heo <tj@kernel.org> wrote: > Hello, Steven. > > So, everything else on your message, sure. You do what you have to > do, but I really don't understand the following part, and this has > been the main source of frustration in the whole discussion. > > On Wed, Jan 10, 2018 at 01:05:17PM -0500, Steven Rostedt wrote: > > You on the other hand are showing unrealistic scenarios, and crying > > that it's what you see in production, with no proof of it. > > I've explained the same scenario multiple times. Unless you're > assuming that I'm lying, it should be amply clear that the scenario is > unrealistic - we've been seeing them taking place repeatedly for quite > a while. The one scenario you did show was the recursive OOM messages, and as Peter Zijlstra pointed out that's more of a bug in the net console than a printk bug. > > What I don't understand is why we can't address this seemingly obvious > problem. If there are technical reasons and the consensus is to not > solve this within flushing logic, sure, we can deal with it otherwise, > but we at least have to be able to agree that there are actual issues > here, no? The issue with the solution you want to do with printk is that it can break existing printk usages. As Petr said, people want printk to do two things. 1 - print out data ASAP, 2 - not lock up the system. The two are fighting each other. You care more about 2 where I (and others, like Peter Zijlstra and Linus) care more about 1. My solution can help with 2 without doing anything to hurt 1. You are NACKing my solution because it doesn't solve this bug with net console. I believe net console should be fixed. You believe that printk should have a work around to not let net console type bugs occur. Which to me is papering over the real bugs. -- Steve ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-10 18:41 ` Steven Rostedt @ 2018-01-10 18:57 ` Tejun Heo 2018-01-10 19:17 ` Steven Rostedt 0 siblings, 1 reply; 140+ messages in thread From: Tejun Heo @ 2018-01-10 18:57 UTC (permalink / raw) To: Steven Rostedt Cc: Petr Mladek, Sergey Senozhatsky, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Sergey Senozhatsky, Pavel Machek, linux-kernel Hello, Steven. On Wed, Jan 10, 2018 at 01:41:57PM -0500, Steven Rostedt wrote: > The issue with the solution you want to do with printk is that it can > break existing printk usages. As Petr said, people want printk to do two > things. 1 - print out data ASAP, 2 - not lock up the system. The two > are fighting each other. You care more about 2 where I (and others, > like Peter Zijlstra and Linus) care more about 1. > > My solution can help with 2 without doing anything to hurt 1. I'm not really sure why punting to a safe context is necessarily unacceptable in terms of #1 because there seems to be a pretty wide gap between printing useful messages synchronously and a system being caught in printk flush to the point where the system is not operational at all. > You are NACKing my solution because it doesn't solve this bug with net > console. I believe net console should be fixed. You believe that printk > should have a work around to not let net console type bugs occur. Which > to me is papering over the real bugs. As I wrote along with nack, I was more concerned with how this was pushed forward by saying that actual problems are not real. As for the netconsole part, sure, that can be one way, but please consider that the messages could be coming from network drivers, of which we have many and a lot of them aren't too high quality. Plus, netconsole is a separate path and network drivers can easily malfunction on memory allocation failures. Again, not a critical problem. We can decide either way but it'd be better to be generally safe (if we can do that reasonably), right? Thanks. -- tejun ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-10 18:57 ` Tejun Heo @ 2018-01-10 19:17 ` Steven Rostedt 2018-01-10 19:34 ` Tejun Heo 2018-01-11 5:35 ` Sergey Senozhatsky 0 siblings, 2 replies; 140+ messages in thread From: Steven Rostedt @ 2018-01-10 19:17 UTC (permalink / raw) To: Tejun Heo Cc: Petr Mladek, Sergey Senozhatsky, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Sergey Senozhatsky, Pavel Machek, linux-kernel On Wed, 10 Jan 2018 10:57:47 -0800 Tejun Heo <tj@kernel.org> wrote: > Hello, Steven. > > On Wed, Jan 10, 2018 at 01:41:57PM -0500, Steven Rostedt wrote: > > The issue with the solution you want to do with printk is that it can > > break existing printk usages. As Petr said, people want printk to do two > > things. 1 - print out data ASAP, 2 - not lock up the system. The two > > are fighting each other. You care more about 2 where I (and others, > > like Peter Zijlstra and Linus) care more about 1. > > > > My solution can help with 2 without doing anything to hurt 1. > > I'm not really sure why punting to a safe context is necessarily > unacceptable in terms of #1 because there seems to be a pretty wide > gap between printing useful messages synchronously and a system being > caught in printk flush to the point where the system is not > operational at all. And what do you define as a "safe" context. And what happens when the system is hosed and that "safe" context no longer exists? How do you know that the safe context is gone? > > > You are NACKing my solution because it doesn't solve this bug with net > > console. I believe net console should be fixed. You believe that printk > > should have a work around to not let net console type bugs occur. Which > > to me is papering over the real bugs. > > As I wrote along with nack, I was more concerned with how this was > pushed forward by saying that actual problems are not real. You mean you saying that? I never created this patch set for the problems you reported. You came in nacking this saying that it doesn't solve your problems and showed some totally unrealistic module that triggers issues that my patch doesn't solve. I admit now that the OOM net console bug is a real issue. But my saying that you were being unrealistic was more about that module you posted to try to demonstrate the issue. This is not the issue I'm trying to solve, and I don't understand why you are against my solution when it is agnostic to any solution that you want to do as well. One way to have an offload solution added on top of mine, is to have a limit in how many messages the printk will do. Honestly, I believe it should always printk its own message if there are no others trying to do a print. Yes, that may still not solve the net console bug, but it helps guarantee that printks get out. But if a printk starts printing more than one message, perhaps that is where we can look at offloading. Similar to how softirq works. If a softirq repeats too many times, it is offloaded to the ksoftirqd thread. We can have a similar approach to printk. > > As for the netconsole part, sure, that can be one way, but please > consider that the messages could be coming from network drivers, of > which we have many and a lot of them aren't too high quality. Plus, > netconsole is a separate path and network drivers can easily > malfunction on memory allocation failures. > > Again, not a critical problem. We can decide either way but it'd be > better to be generally safe (if we can do that reasonably), right? OK, lets start over. Right now my focus is an incremental approach. I'm not trying to solve all issues that printk has. I've focused on a single issue, and that is that printk is unbounded. Coming from a Real Time background, I find that is a big problem. I hate unbounded algorithms. I looked at this and found a way to make printk have a max bounded time it can print. Sure, it can be more than what you want, but it is a constant time, that can be measured. Hence, it is an O(1) solution. Now, if there is still issues with printk, there may be cases where offloading makes sense. I don't see why we should stop my solution because we are not addressing these other issues where offloading may make sense. My solution is simple, and does not impact other solutions. It may even show that other solutions are not needed. But that's a good thing. I'm not against an offloading solution if it can solve issues without impacting the other printk use cases. I'm currently only focusing on this solution which you are fighting me against. -- Steve ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-10 19:17 ` Steven Rostedt @ 2018-01-10 19:34 ` Tejun Heo 2018-01-10 19:44 ` Steven Rostedt 2018-01-11 5:35 ` Sergey Senozhatsky 1 sibling, 1 reply; 140+ messages in thread From: Tejun Heo @ 2018-01-10 19:34 UTC (permalink / raw) To: Steven Rostedt Cc: Petr Mladek, Sergey Senozhatsky, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Sergey Senozhatsky, Pavel Machek, linux-kernel Hello, Steven. On Wed, Jan 10, 2018 at 02:17:58PM -0500, Steven Rostedt wrote: > > I'm not really sure why punting to a safe context is necessarily > > unacceptable in terms of #1 because there seems to be a pretty wide > > gap between printing useful messages synchronously and a system being > > caught in printk flush to the point where the system is not > > operational at all. > > And what do you define as a "safe" context. And what happens when the > system is hosed and that "safe" context no longer exists? How do you > know that the safe context is gone? Hmm.. yeah, we have that problem now too. Panic bypassing synchronizations solves some of that I guess. > I admit now that the OOM net console bug is a real issue. But my > saying that you were being unrealistic was more about that module you > posted to try to demonstrate the issue. Heh, our recollections would differ widely there, but let's leave it at that. > Right now my focus is an incremental approach. I'm not trying to solve > all issues that printk has. I've focused on a single issue, and that is > that printk is unbounded. Coming from a Real Time background, I find > that is a big problem. I hate unbounded algorithms. I looked at this > and found a way to make printk have a max bounded time it can print. > Sure, it can be more than what you want, but it is a constant time, > that can be measured. Hence, it is an O(1) solution. It is bound iff there are contexts which can bounce the flushing role among them, right? > Now, if there is still issues with printk, there may be cases where > offloading makes sense. I don't see why we should stop my solution > because we are not addressing these other issues where offloading may > make sense. My solution is simple, and does not impact other solutions. > It may even show that other solutions are not needed. But that's a good > thing. > > I'm not against an offloading solution if it can solve issues without > impacting the other printk use cases. I'm currently only focusing on > this solution which you are fighting me against. Oh yeah, sure. It might actually be pretty simple to combine into your solution. For example, can't we just always make sure that there's at least one sleepable context which participates in your pingpongs, which only kicks in when a particular context is trapped too long? Thanks. -- tejun ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-10 19:34 ` Tejun Heo @ 2018-01-10 19:44 ` Steven Rostedt 2018-01-10 22:44 ` Tejun Heo 0 siblings, 1 reply; 140+ messages in thread From: Steven Rostedt @ 2018-01-10 19:44 UTC (permalink / raw) To: Tejun Heo Cc: Petr Mladek, Sergey Senozhatsky, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Sergey Senozhatsky, Pavel Machek, linux-kernel On Wed, 10 Jan 2018 11:34:51 -0800 Tejun Heo <tj@kernel.org> wrote: > > Right now my focus is an incremental approach. I'm not trying to solve > > all issues that printk has. I've focused on a single issue, and that is > > that printk is unbounded. Coming from a Real Time background, I find > > that is a big problem. I hate unbounded algorithms. I looked at this > > and found a way to make printk have a max bounded time it can print. > > Sure, it can be more than what you want, but it is a constant time, > > that can be measured. Hence, it is an O(1) solution. > > It is bound iff there are contexts which can bounce the flushing role > among them, right? No, not at all. The printk can only print what's in the buffer. The buffer can only get more to print if another printk occurs. If that happens, that other printk takes over. Thus, any single printk can print at most one buffer full. Which is bounded to the size of the buffer. Yes, there can be the case that printks are added via an interrupt, but then again, it's an issue that a single CPU. And printks from interrupt context should be considered critical, part of the ASAP category. If they are not critical, then they shouldn't be doing printks. That may be a place were we can add a "printk_delay", for things like non critical printks in interrupt context, that can trigger offloading? > > > Now, if there is still issues with printk, there may be cases where > > offloading makes sense. I don't see why we should stop my solution > > because we are not addressing these other issues where offloading may > > make sense. My solution is simple, and does not impact other solutions. > > It may even show that other solutions are not needed. But that's a good > > thing. > > > > I'm not against an offloading solution if it can solve issues without > > impacting the other printk use cases. I'm currently only focusing on > > this solution which you are fighting me against. > > Oh yeah, sure. It might actually be pretty simple to combine into > your solution. For example, can't we just always make sure that > there's at least one sleepable context which participates in your > pingpongs, which only kicks in when a particular context is trapped > too long? The solution can be extended to that if the need exists, yes. -- Steve ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-10 19:44 ` Steven Rostedt @ 2018-01-10 22:44 ` Tejun Heo 0 siblings, 0 replies; 140+ messages in thread From: Tejun Heo @ 2018-01-10 22:44 UTC (permalink / raw) To: Steven Rostedt Cc: Petr Mladek, Sergey Senozhatsky, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Sergey Senozhatsky, Pavel Machek, linux-kernel Hello, Steven. On Wed, Jan 10, 2018 at 02:44:55PM -0500, Steven Rostedt wrote: > Yes, there can be the case that printks are added via an interrupt, but > then again, it's an issue that a single CPU. And printks from interrupt > context should be considered critical, part of the ASAP category. If > they are not critical, then they shouldn't be doing printks. That may > be a place were we can add a "printk_delay", for things like non > critical printks in interrupt context, that can trigger offloading? Ideally, if we can annoate all those, that would be great. I don't feel too confident about that tho. Here is one network driver that we deal with often. $ wc -l $(git ls-files drivers/net/ethernet/mellanox/mlx5) | tail -1 48029 total It's close to 50k lines of code and AFAICT this seems to be the trend. Most things which are happening in the driver are complicated and sometimes lead to surprising behaviors. With memory allocation failures thrown in, idk. I think our exposure to this sort of problem is pretty wide and we can't reasonably keep close eyes on them, especially for problems which only happen under high stress conditions which aren't tested that easily. > > Oh yeah, sure. It might actually be pretty simple to combine into > > your solution. For example, can't we just always make sure that > > there's at least one sleepable context which participates in your > > pingpongs, which only kicks in when a particular context is trapped > > too long? > > The solution can be extended to that if the need exists, yes. I think it'd be really great if the core code can protect itself against these things going haywire. We can ignore messages generated while being recursive from netconsole, but that would mean, for example, if that giant driver messes up in that path (netconsole under memory pressure), it'd be painful to debug. So, if we can, it'd be really great to have a generic protection which can handle these situations. Thanks. -- tejun ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-10 19:17 ` Steven Rostedt 2018-01-10 19:34 ` Tejun Heo @ 2018-01-11 5:35 ` Sergey Senozhatsky 1 sibling, 0 replies; 140+ messages in thread From: Sergey Senozhatsky @ 2018-01-11 5:35 UTC (permalink / raw) To: Steven Rostedt Cc: Tejun Heo, Petr Mladek, Sergey Senozhatsky, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Sergey Senozhatsky, Pavel Machek, linux-kernel On (01/10/18 14:17), Steven Rostedt wrote: [..] > OK, lets start over. good. > Right now my focus is an incremental approach. I'm not trying to solve > all issues that printk has. I've focused on a single issue, and that is > that printk is unbounded. Coming from a Real Time background, I find > that is a big problem. I hate unbounded algorithms. agreed! so why not bound it to watchdog threshold then? why bound it to a random O(logbuf) thing? which is not even constant. when you un-register or disable one or several consoles then call_console_drivers() becomes faster; when you register/enable consoles then the entire call_console_drivers() becomes slower. how do we build a reliable algorithm on that O(logbuf)? -ss ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-10 18:05 ` Steven Rostedt 2018-01-10 18:12 ` Tejun Heo @ 2018-01-11 4:58 ` Sergey Senozhatsky 2018-01-11 9:34 ` Petr Mladek 1 sibling, 1 reply; 140+ messages in thread From: Sergey Senozhatsky @ 2018-01-11 4:58 UTC (permalink / raw) To: Steven Rostedt Cc: Tejun Heo, Petr Mladek, Sergey Senozhatsky, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Sergey Senozhatsky, Pavel Machek, linux-kernel On (01/10/18 13:05), Steven Rostedt wrote: [..] > My solution takes printk from its current unbounded state, and makes it > fixed bounded. Which means printk() is now a O(1) algorithm. ^^^ O(logbuf) and O(logbuf) > watchdog_thresh is totally possible. and there is nothing super unlucky in having O(logbuf). limiting printk is the right way to go, sure. but you limit it to the wrong thing. limiting it to logbuf is not enough, especially given that logbuf size is configurable via kernel param - it's a moving target. if one wants printk to stop disappointing the watchdog then printk must learn to respect watchdog's threshold. https://marc.info/?l=linux-kernel&m=151444381104068 hence a small fix up --- diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index 8882a4bf2a9e..4efa7542d84d 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -2341,6 +2341,14 @@ void console_unlock(void) printk_safe_enter_irqsave(flags); raw_spin_lock(&logbuf_lock); + + if (log_next_seq - console_seq > 666) { + console_seq = log_next_seq; + raw_spin_unlock(&logbuf_lock); + printk_safe_exit_irqrestore(flags); + panic("you mad bro? this can softlockup your system! let me fix that for you"); + } + if (seen_seq != log_next_seq) { wake_klogd = true; seen_seq = log_next_seq; --- > The solution is simple, everyone at KS agreed with it, there should be > no controversy here. frankly speaking, that's not what I recall ;) [..] > My printk solution is solid, with no risk of regressions of current > printk usages. except that handing off a console_sem to atomic task when there is O(logbuf) > watchdog_thresh is a regression, basically... it is what it is. > If anything, I'll pull theses patches myself, and push them to Linus > directly lovely. -ss ^ permalink raw reply related [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-11 4:58 ` Sergey Senozhatsky @ 2018-01-11 9:34 ` Petr Mladek 2018-01-11 10:38 ` Sergey Senozhatsky 0 siblings, 1 reply; 140+ messages in thread From: Petr Mladek @ 2018-01-11 9:34 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Steven Rostedt, Tejun Heo, Sergey Senozhatsky, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Pavel Machek, linux-kernel On Thu 2018-01-11 13:58:17, Sergey Senozhatsky wrote: > On (01/10/18 13:05), Steven Rostedt wrote: > > The solution is simple, everyone at KS agreed with it, there should be > > no controversy here. > > frankly speaking, that's not what I recall ;) To be honest, I do not longer remember the details. I think that nobody was really against that solution. Of course, there were doubts and other proposals. I think that I was actually the most sceptical guy there. I would split my old doubts into three areas: + new possible deadlocks -> I was wrong + did not fully prevent softlockups -> no real life example in hands + looked tricky and complex -> like many other new things You see that I have changed my mind and decided to give this solution a chance. > [..] > > My printk solution is solid, with no risk of regressions of current > > printk usages. > > except that handing off a console_sem to atomic task when there > is O(logbuf) > watchdog_thresh is a regression, basically... > it is what it is. How this could be a regression? Is not the victim that handles other printk's random? What protected the atomic task to handle the other printks before this patch? Or do you have a system that started to suffer from softlockups with this patchset and did not do this before? > > > If anything, I'll pull theses patches myself, and push them to Linus > > directly > > lovely. Do you know about any system where this patch made the softlockup deterministically or statistically more likely, please? Best Regards, Petr ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-11 9:34 ` Petr Mladek @ 2018-01-11 10:38 ` Sergey Senozhatsky 2018-01-11 11:50 ` Petr Mladek 2018-01-11 16:29 ` Steven Rostedt 0 siblings, 2 replies; 140+ messages in thread From: Sergey Senozhatsky @ 2018-01-11 10:38 UTC (permalink / raw) To: Petr Mladek Cc: Sergey Senozhatsky, Steven Rostedt, Tejun Heo, Sergey Senozhatsky, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Pavel Machek, linux-kernel On (01/11/18 10:34), Petr Mladek wrote: [..] > > except that handing off a console_sem to atomic task when there > > is O(logbuf) > watchdog_thresh is a regression, basically... > > it is what it is. > > How this could be a regression? Is not the victim that handles > other printk's random? What protected the atomic task to > handle the other printks before this patch? the non-atomic -> atomic context console_sem transfer. we previously would have kept the console_sem owner to its non-atomic owner. we now will make sure that if printk from atomic context happens then it will make it to console_unlock() loop. emphasis on O(logbuf) > watchdog_thresh. - if the patch's goal is to bound (not necessarily to watchdog's threshold) the amount of time we spend in console_unlock(), then the patch is kinda overcomplicated. but no further questions in this case. - but if the patch's goal is to bound (to lockup threshold) the amount of time spent in console_unlock() in order to avoid lockups [uh, a reason], then the patch is rather oversimplified. claiming that for any given A, B, C the following is always true A * B < C where A is the amount of data to print in the worst case B the time call_console_drivers() needs to print a single char to all registered and enabled consoles C the watchdog's threshold is not really a step forward. and the "last console_sem owner prints all pending messages" rule is still there. > Or do you have a system that started to suffer from softlockups > with this patchset and did not do this before? [..] > Do you know about any system where this patch made the softlockup > deterministically or statistically more likely, please? I have explained many, many times why my boards die just like before. why would I bother collecting any numbers... -ss ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-11 10:38 ` Sergey Senozhatsky @ 2018-01-11 11:50 ` Petr Mladek 2018-01-11 16:29 ` Steven Rostedt 1 sibling, 0 replies; 140+ messages in thread From: Petr Mladek @ 2018-01-11 11:50 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Steven Rostedt, Tejun Heo, Sergey Senozhatsky, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Pavel Machek, linux-kernel On Thu 2018-01-11 19:38:45, Sergey Senozhatsky wrote: > On (01/11/18 10:34), Petr Mladek wrote: > [..] > > > except that handing off a console_sem to atomic task when there > > > is O(logbuf) > watchdog_thresh is a regression, basically... > > > it is what it is. > > > > How this could be a regression? Is not the victim that handles > > other printk's random? What protected the atomic task to > > handle the other printks before this patch? > > the non-atomic -> atomic context console_sem transfer. we previously > would have kept the console_sem owner to its non-atomic owner. we now > will make sure that if printk from atomic context happens then it will > make it to console_unlock() loop. > emphasis on O(logbuf) > watchdog_thresh. Sergey, please, why do you completely and repeatedly ignore that argument about statistical effects? Yes, the above scenario is possible. But Steven's patch might also move the owner from atomic context to a non-atomic one. The chances should be more or less equal. The main advantage is that the owner is moved. This should statistically lower the chance of a soft-lockup. > > > Or do you have a system that started to suffer from softlockups > > with this patchset and did not do this before? > [..] > > Do you know about any system where this patch made the softlockup > > deterministically or statistically more likely, please? > > I have explained many, many times why my boards die just like before. > why would I bother collecting any numbers... Is it with your own printk stress tests or during "normal" work? If it is during a normal work, is there any chance that we could have a look at the logs? Best Regards, Petr ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-11 10:38 ` Sergey Senozhatsky 2018-01-11 11:50 ` Petr Mladek @ 2018-01-11 16:29 ` Steven Rostedt 2018-01-12 1:30 ` Steven Rostedt 2018-01-12 2:56 ` Sergey Senozhatsky 1 sibling, 2 replies; 140+ messages in thread From: Steven Rostedt @ 2018-01-11 16:29 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Petr Mladek, Tejun Heo, Sergey Senozhatsky, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Pavel Machek, linux-kernel On Thu, 11 Jan 2018 19:38:45 +0900 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> wrote: > > the non-atomic -> atomic context console_sem transfer. we previously > would have kept the console_sem owner to its non-atomic owner. we now > will make sure that if printk from atomic context happens then it will > make it to console_unlock() loop. > emphasis on O(logbuf) > watchdog_thresh. > > > - if the patch's goal is to bound (not necessarily to watchdog's threshold) > the amount of time we spend in console_unlock(), then the patch is kinda > overcomplicated. but no further questions in this case. It's goal is to keep printk from running amok on a single CPU like it currently does. This prevents one printk from never ending. And it is far from complex. It doesn't deal with "offloading". The "handover" is only done to those that are doing printks. What do you do if all CPUs are in "critical sections", how would a "handoff to safe" work? Will the printks never get out? If the machine were to triple fault and reboot, we lost all of it. > > - but if the patch's goal is to bound (to lockup threshold) the amount of > time spent in console_unlock() in order to avoid lockups [uh, a reason], > then the patch is rather oversimplified. It's bound to print all the information that has been added to the printk buffer. You want to bound it to some "time" and what about the printks that haven't gotten out yet? Delay them to something else, and if the machine were to crash in the transfer, we lost all that data. My method, there's really no delay between a hand off. There's always an active CPU doing printing. It matches the current method which works well for getting information out. A delayed approach will break that and that's what people like myself, Peter, Linus and others are worried about. > > > claiming that for any given A, B, C the following is always true > > A * B < C > > where > A is the amount of data to print in the worst case > B the time call_console_drivers() needs to print a single > char to all registered and enabled consoles > C the watchdog's threshold > > is not really a step forward. It's no different than what we have, except that we currently have A being infinite. My patch makes A no longer infinite, but a constant. Yes that constant is mutable, but it's still a constant, and controlled by the user. That to me is definitely a BIG step forward. > > and the "last console_sem owner prints all pending messages" rule > is still there. > > > > Or do you have a system that started to suffer from softlockups > > with this patchset and did not do this before? > [..] > > Do you know about any system where this patch made the softlockup > > deterministically or statistically more likely, please? > > I have explained many, many times why my boards die just like before. > why would I bother collecting any numbers... Great, and there's cases that die that my patch solves. Lets add my patch now since it is orthogonal to an offloading approach and see how it works, because it would solve issues that I have hit. If you can show that this isn't good enough we can add another approach. We are solving two different problems. My patch simply makes one printk() no longer unbounded. It's a fixed time. Honestly, I don't see why you are against this patch. It doesn't stop your work. If this patch isn't enough (but it does fix some issues), then we can look at adding other approaches. Really, it sounds like you are afraid of this patch, that it might be good enough for most cases which would make adding another approach even more difficult. -- Steve ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-11 16:29 ` Steven Rostedt @ 2018-01-12 1:30 ` Steven Rostedt 2018-01-12 2:55 ` Steven Rostedt 2018-01-12 3:12 ` Sergey Senozhatsky 2018-01-12 2:56 ` Sergey Senozhatsky 1 sibling, 2 replies; 140+ messages in thread From: Steven Rostedt @ 2018-01-12 1:30 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Petr Mladek, Tejun Heo, Sergey Senozhatsky, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Pavel Machek, linux-kernel On Thu, 11 Jan 2018 11:29:08 -0500 Steven Rostedt <rostedt@goodmis.org> wrote: > > claiming that for any given A, B, C the following is always true > > > > A * B < C > > > > where > > A is the amount of data to print in the worst case > > B the time call_console_drivers() needs to print a single > > char to all registered and enabled consoles > > C the watchdog's threshold > > > > is not really a step forward. > > It's no different than what we have, except that we currently have A > being infinite. My patch makes A no longer infinite, but a constant. > Yes that constant is mutable, but it's still a constant, and > controlled by the user. That to me is definitely a BIG step forward. I have to say that your analysis here really does point out the benefit of my patch. Today, printk() can print for a time of A * B, where, as you state above: A is the amount of data to print in the worst case B the time call_console_drivers() needs to print a single char to all registered and enabled consoles In the worse case, the current approach is A is infinite. That is, printk() never stops, as long as there is a printk happening on another CPU before B can finish. A will keep growing. The call to printk() will never return. The more CPUs you have, the more likely this will occur. All it takes is a few CPUs doing periodic printks. If there is a slow console, where the periodic printk on other CPUs occur quicker than the first can finish, the first one will be stuck forever. Doesn't take much to have this happen. With my patch, A is fixed to the size of the buffer. A single printk() can never print more than that. If another CPU comes in and does a printk, then it will take over the task of printing, and release the first printk. -- Steve ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-12 1:30 ` Steven Rostedt @ 2018-01-12 2:55 ` Steven Rostedt 2018-01-12 4:20 ` Steven Rostedt 2018-01-16 19:44 ` Tejun Heo 2018-01-12 3:12 ` Sergey Senozhatsky 1 sibling, 2 replies; 140+ messages in thread From: Steven Rostedt @ 2018-01-12 2:55 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Petr Mladek, Tejun Heo, Sergey Senozhatsky, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Pavel Machek, linux-kernel On Thu, 11 Jan 2018 20:30:57 -0500 Steven Rostedt <rostedt@goodmis.org> wrote: > I have to say that your analysis here really does point out the benefit > of my patch. > > Today, printk() can print for a time of A * B, where, as you state > above: > > A is the amount of data to print in the worst case > B the time call_console_drivers() needs to print a single > char to all registered and enabled consoles > > In the worse case, the current approach is A is infinite. That is, > printk() never stops, as long as there is a printk happening on another > CPU before B can finish. A will keep growing. The call to printk() will > never return. The more CPUs you have, the more likely this will occur. > All it takes is a few CPUs doing periodic printks. If there is a slow > console, where the periodic printk on other CPUs occur quicker than the > first can finish, the first one will be stuck forever. Doesn't take > much to have this happen. > > With my patch, A is fixed to the size of the buffer. A single printk() > can never print more than that. If another CPU comes in and does a > printk, then it will take over the task of printing, and release the > first printk. In fact, below is a module I made (starting with Tejun's crazy stress test, then removing all the craziness). This simple module locks up the system without my patch. After applying my patch, the system runs fine. All I did was start off a work queue on each CPU, and each CPU does one printk() followed by a millisecond sleep. No 10,000 printks, nothing in an interrupt handler. Preemption is disabled while the printk happens, but that's normal. This is much closer to an OOM happening all over the system, where OOMs stack dumps are occurring on different CPUS. I ran this on a box with 4 CPUs and a serial console (so it has a slow console). Again, all I have is each CPU doing exactly ONE printk()! then sleeping for a full millisecond! It will cause a lot of output, and perhaps slow the system down. But it should not lock up the system. But without my patch, it does! Try it! Test it on a box, and it will lock up. Then add my patch and see what the results are. I think this speaks very loudly in favor of applying my patch. Again, the below module locks up my system immediately without my patch. With my patch, no problem. In fact, it's still running, while I wrote this email, and it hardly shows a slow down in the system. -- Steve #include <linux/module.h> #include <linux/delay.h> #include <linux/sched.h> #include <linux/mutex.h> #include <linux/workqueue.h> #include <linux/hrtimer.h> static bool stop_testing; static void preempt_printk_workfn(struct work_struct *work) { while (!READ_ONCE(stop_testing)) { preempt_disable(); printk("%5d%-75s\n", smp_processor_id(), " XXX PREEMPT"); preempt_enable(); msleep(1); } } static struct work_struct __percpu *works; static void finish(void) { int cpu; WRITE_ONCE(stop_testing, true); for_each_online_cpu(cpu) flush_work(per_cpu_ptr(works, cpu)); free_percpu(works); } static int __init test_init(void) { int cpu; works = alloc_percpu(struct work_struct); if (!works) return -ENOMEM; /* * This is just a test module. This will break if you * do any CPU hot plugging between loading and * unloading the module. */ for_each_online_cpu(cpu) { struct work_struct *work = per_cpu_ptr(works, cpu); INIT_WORK(work, &preempt_printk_workfn); schedule_work_on(cpu, work); } return 0; } static void __exit test_exit(void) { finish(); } module_init(test_init); module_exit(test_exit); MODULE_LICENSE("GPL"); ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-12 2:55 ` Steven Rostedt @ 2018-01-12 4:20 ` Steven Rostedt 2018-01-16 19:44 ` Tejun Heo 1 sibling, 0 replies; 140+ messages in thread From: Steven Rostedt @ 2018-01-12 4:20 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Petr Mladek, Tejun Heo, Sergey Senozhatsky, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Pavel Machek, linux-kernel On Thu, 11 Jan 2018 21:55:47 -0500 Steven Rostedt <rostedt@goodmis.org> wrote: > I ran this on a box with 4 CPUs and a serial console (so it has a slow > console). Again, all I have is each CPU doing exactly ONE printk()! > then sleeping for a full millisecond! It will cause a lot of output, > and perhaps slow the system down. But it should not lock up the system. > But without my patch, it does! I decided to see how this works without a slow serial console. So I rebooted the box and enabled hyper-threading (doubling the number of CPUs to 8), and then ran this module, with serial disabled. As expected, it did not lock up. That's because there was only a single console (VGA) and it is fast enough to keep up. Especially, since I have a 1 millisecond sleep between printks. But I ran the function_graph tracer to see what was happening. Here's the unpatched case. It didn't take long to see a single CPU suffering (and this is with a fast console!) kworker/1:2-309 [001] 78.677770: funcgraph_entry: | printk() { kworker/7:1-176 [007] 78.677772: funcgraph_entry: | printk() { kworker/3:1-72 [003] 78.677772: funcgraph_entry: | printk() { kworker/7:1-176 [007] 78.677778: funcgraph_exit: 4.528 us | } kworker/3:1-72 [003] 78.677779: funcgraph_exit: 5.875 us | } kworker/0:0-3 [000] 78.678745: funcgraph_entry: | printk() { kworker/5:1-78 [005] 78.678749: funcgraph_entry: | printk() { kworker/4:1-73 [004] 78.678751: funcgraph_entry: | printk() { kworker/0:0-3 [000] 78.678752: funcgraph_exit: 4.893 us | } kworker/5:1-78 [005] 78.678754: funcgraph_exit: 4.287 us | } kworker/4:1-73 [004] 78.678756: funcgraph_exit: 3.964 us | } kworker/6:1-147 [006] 78.679751: funcgraph_entry: | printk() { kworker/2:3-1295 [002] 78.679753: funcgraph_entry: | printk() { kworker/6:1-147 [006] 78.679767: funcgraph_exit: + 13.735 us | } kworker/2:3-1295 [002] 78.679768: funcgraph_exit: + 14.318 us | } kworker/7:1-176 [007] 78.680751: funcgraph_entry: | printk() { kworker/3:1-72 [003] 78.680753: funcgraph_entry: | printk() { kworker/7:1-176 [007] 78.680756: funcgraph_exit: 3.981 us | } kworker/3:1-72 [003] 78.680757: funcgraph_exit: 3.499 us | } kworker/5:1-78 [005] 78.681734: funcgraph_entry: 3.388 us | printk(); kworker/4:1-73 [004] 78.681752: funcgraph_entry: | printk() { kworker/0:0-3 [000] 78.681753: funcgraph_entry: | printk() { kworker/4:1-73 [004] 78.681756: funcgraph_exit: 3.009 us | } kworker/0:0-3 [000] 78.681757: funcgraph_exit: 3.708 us | } kworker/2:3-1295 [002] 78.682742: funcgraph_entry: | printk() { kworker/6:1-147 [006] 78.682746: funcgraph_entry: | printk() { kworker/2:3-1295 [002] 78.682749: funcgraph_exit: 4.548 us | } kworker/6:1-147 [006] 78.682750: funcgraph_exit: 3.001 us | } kworker/3:1-72 [003] 78.683751: funcgraph_entry: | printk() { kworker/7:1-176 [007] 78.683753: funcgraph_entry: | printk() { kworker/3:1-72 [003] 78.683756: funcgraph_exit: 3.869 us | } kworker/7:1-176 [007] 78.683757: funcgraph_exit: 4.300 us | } kworker/5:1-78 [005] 78.684736: funcgraph_entry: 2.074 us | printk(); kworker/4:1-73 [004] 78.684755: funcgraph_entry: | printk() { kworker/0:0-3 [000] 78.684755: funcgraph_entry: 3.065 us | printk(); kworker/4:1-73 [004] 78.684760: funcgraph_exit: 4.091 us | } kworker/6:1-147 [006] 78.685744: funcgraph_entry: | printk() { kworker/2:3-1295 [002] 78.685744: funcgraph_entry: 4.616 us | printk(); kworker/6:1-147 [006] 78.685752: funcgraph_exit: 5.943 us | } kworker/7:1-176 [007] 78.686763: funcgraph_entry: | printk() { kworker/3:1-72 [003] 78.686767: funcgraph_entry: | printk() { kworker/7:1-176 [007] 78.686770: funcgraph_exit: 4.570 us | } kworker/3:1-72 [003] 78.686771: funcgraph_exit: 3.262 us | } kworker/1:2-309 [001] 78.687626: funcgraph_exit: # 9854.982 us | } CPU 1 was stuck for 9 milliseconds doing nothing but handling printk. And this is without a serial or slow console. With a patched kernel: kworker/7:1-176 [007] 85.937411: funcgraph_entry: | printk() { kworker/3:1-72 [003] 85.937416: funcgraph_exit: 3.357 us | } kworker/7:1-176 [007] 85.937416: funcgraph_exit: 4.388 us | } kworker/2:2-315 [002] 85.937793: funcgraph_exit: # 1391.842 us | } kworker/1:2-592 [001] 85.938391: funcgraph_entry: | printk() { kworker/4:2-529 [004] 85.938396: funcgraph_entry: 3.267 us | printk(); kworker/6:1-150 [006] 85.938555: funcgraph_exit: # 1159.354 us | } kworker/0:2-127 [000] 85.939393: funcgraph_entry: | printk() { kworker/5:2-352 [005] 85.939394: funcgraph_entry: + 13.403 us | printk(); kworker/1:2-592 [001] 85.939718: funcgraph_exit: # 1325.211 us | } kworker/0:2-127 [000] 85.940345: funcgraph_exit: ! 951.361 us | } kworker/7:1-176 [007] 85.940390: funcgraph_entry: | printk() { kworker/3:1-72 [003] 85.940390: funcgraph_entry: | printk() { kworker/2:2-315 [002] 85.940391: funcgraph_entry: | printk() { kworker/7:1-176 [007] 85.940396: funcgraph_exit: 4.144 us | } kworker/2:2-315 [002] 85.940397: funcgraph_exit: 5.687 us | } kworker/4:2-529 [004] 85.941403: funcgraph_entry: | printk() { kworker/6:1-150 [006] 85.941407: funcgraph_entry: 3.167 us | printk(); kworker/3:1-72 [003] 85.941545: funcgraph_exit: # 1153.899 us | } kworker/4:2-529 [004] 85.942371: funcgraph_exit: ! 966.322 us | } kworker/1:2-592 [001] 85.942411: funcgraph_entry: | printk() { kworker/5:2-352 [005] 85.942411: funcgraph_entry: | printk() { kworker/1:2-592 [001] 85.942416: funcgraph_exit: 4.099 us | } kworker/0:2-127 [000] 85.942553: funcgraph_entry: | printk() { kworker/5:2-352 [005] 85.942739: funcgraph_exit: ! 326.853 us | } kworker/0:2-127 [000] 85.943358: funcgraph_exit: ! 804.095 us | } kworker/2:2-315 [002] 85.943388: funcgraph_entry: | printk() { kworker/7:1-176 [007] 85.943391: funcgraph_entry: | printk() { kworker/2:2-315 [002] 85.943754: funcgraph_exit: ! 364.921 us | } kworker/7:1-176 [007] 85.944127: funcgraph_exit: ! 734.864 us | } kworker/6:1-150 [006] 85.944408: funcgraph_entry: | printk() { kworker/3:1-72 [003] 85.944408: funcgraph_entry: 4.911 us | printk(); kworker/6:1-150 [006] 85.945235: funcgraph_exit: ! 826.596 us | } kworker/0:2-127 [000] 85.945398: funcgraph_entry: | printk() { kworker/5:2-352 [005] 85.945399: funcgraph_entry: | printk() { kworker/4:2-529 [004] 85.945400: funcgraph_entry: | printk() { kworker/1:2-592 [001] 85.945412: funcgraph_entry: | printk() { kworker/5:2-352 [005] 85.945415: funcgraph_exit: + 14.537 us | } kworker/4:2-529 [004] 85.945416: funcgraph_exit: 5.494 us | } kworker/0:2-127 [000] 85.945736: funcgraph_exit: ! 337.000 us | } kworker/7:1-176 [007] 85.946403: funcgraph_entry: | printk() { kworker/2:2-315 [002] 85.946409: funcgraph_entry: 3.275 us | printk(); kworker/1:2-592 [001] 85.946546: funcgraph_exit: # 1133.155 us | } The load is spread out much better. No one CPU is stuck too badly. As the function_graph tracer annotates functions that take over a millisecond with a '#', I can grep and see how many take that long, and for how long. $ trace-cmd report trace-printk-nopatch-8cpus.dat |grep '#' kworker/4:1-73 [004] 78.658973: funcgraph_exit: # 1247.220 us | } kworker/2:3-1295 [002] 78.662340: funcgraph_exit: # 2616.456 us | } kworker/7:1-176 [007] 78.671727: funcgraph_exit: # 1996.234 us | } kworker/4:1-73 [004] 78.676696: funcgraph_exit: # 2954.230 us | } kworker/1:2-309 [001] 78.687626: funcgraph_exit: # 9854.982 us | } kworker/5:1-78 [005] 78.692652: funcgraph_exit: # 4920.607 us | } kworker/5:1-78 [005] 78.696737: funcgraph_exit: # 1983.090 us | } kworker/5:1-78 [005] 78.701426: funcgraph_exit: # 1686.832 us | } kworker/2:3-1295 [002] 78.710736: funcgraph_exit: # 6975.033 us | } kworker/1:2-309 [001] 78.712455: funcgraph_exit: # 1711.895 us | } kworker/7:1-176 [007] 78.721588: funcgraph_exit: # 7835.767 us | } kworker/1:2-309 [001] 78.729626: funcgraph_exit: # 5879.358 us | } kworker/3:1-72 [003] 78.744426: funcgraph_exit: # 12678.256 us | } kworker/1:2-309 [001] 78.754549: funcgraph_exit: # 7816.182 us | } kworker/7:1-176 [007] 78.758612: funcgraph_exit: # 1874.185 us | } kworker/5:1-78 [005] 78.762615: funcgraph_exit: # 1878.463 us | } kworker/2:3-1295 [002] 78.771593: funcgraph_exit: # 6849.619 us | } kworker/3:1-72 [003] 78.776616: funcgraph_exit: # 2868.446 us | } kworker/1:2-309 [001] 78.780585: funcgraph_exit: # 2843.085 us | } kworker/7:1-176 [007] 78.785701: funcgraph_exit: # 3949.963 us | } kworker/1:2-309 [001] 78.787192: funcgraph_exit: # 1452.146 us | } kworker/2:3-1295 [002] 78.791554: funcgraph_exit: # 2821.999 us | } kworker/5:1-78 [005] 78.793686: funcgraph_exit: # 1934.499 us | } kworker/2:3-1295 [002] 78.795377: funcgraph_exit: # 1641.652 us | } kworker/6:1-147 [006] 78.815413: funcgraph_exit: # 2669.295 us | } kworker/5:1-78 [005] 78.821529: funcgraph_exit: # 1782.758 us | } kworker/5:1-78 [005] 78.826732: funcgraph_exit: # 2993.772 us | } kworker/6:1-147 [006] 78.829676: funcgraph_exit: # 1920.164 us | } kworker/5:1-78 [005] 78.831464: funcgraph_exit: # 1728.834 us | } kworker/1:2-309 [001] 78.833674: funcgraph_exit: # 1939.356 us | } kworker/1:2-309 [001] 78.839663: funcgraph_exit: # 3908.825 us | } kworker/5:1-78 [005] 78.841376: funcgraph_exit: # 1624.089 us | } kworker/1:2-309 [001] 78.843474: funcgraph_exit: # 1725.975 us | } kworker/5:1-78 [005] 78.845490: funcgraph_exit: # 1753.258 us | } kworker/5:1-78 [005] 78.850592: funcgraph_exit: # 2839.801 us | } kworker/2:3-1295 [002] 78.855668: funcgraph_exit: # 3925.402 us | } kworker/6:1-147 [006] 78.866346: funcgraph_exit: # 10603.155 us | } CPUs can be stuck for over 10 milliseconds doing just printk! With my patch: kworker/0:2-127 [000] 85.902486: funcgraph_exit: # 1092.105 us | } kworker/2:2-315 [002] 85.904458: funcgraph_exit: # 1070.174 us | } kworker/4:2-529 [004] 85.907523: funcgraph_exit: # 1131.189 us | } kworker/6:1-150 [006] 85.909187: funcgraph_exit: # 1802.074 us | } kworker/7:1-176 [007] 85.910534: funcgraph_exit: # 1138.249 us | } kworker/1:2-592 [001] 85.911586: funcgraph_exit: # 1207.807 us | } kworker/2:2-315 [002] 85.914585: funcgraph_exit: # 1183.669 us | } kworker/6:1-150 [006] 85.915426: funcgraph_exit: # 1019.587 us | } kworker/5:2-352 [005] 85.916516: funcgraph_exit: # 1120.144 us | } kworker/3:1-72 [003] 85.922472: funcgraph_exit: # 1071.437 us | } kworker/4:2-529 [004] 85.923685: funcgraph_exit: # 1296.953 us | } kworker/1:2-592 [001] 85.924481: funcgraph_exit: # 1051.758 us | } kworker/5:2-352 [005] 85.926536: funcgraph_exit: # 1126.423 us | } kworker/2:2-315 [002] 85.927403: funcgraph_exit: # 1020.366 us | } kworker/1:2-592 [001] 85.928493: funcgraph_exit: # 1094.864 us | } kworker/6:1-150 [006] 85.931457: funcgraph_exit: # 1052.531 us | } kworker/1:2-592 [001] 85.932779: funcgraph_exit: # 1371.806 us | } kworker/5:2-352 [005] 85.933536: funcgraph_exit: # 1128.199 us | } kworker/2:2-315 [002] 85.937793: funcgraph_exit: # 1391.842 us | } kworker/6:1-150 [006] 85.938555: funcgraph_exit: # 1159.354 us | } kworker/1:2-592 [001] 85.939718: funcgraph_exit: # 1325.211 us | } kworker/3:1-72 [003] 85.941545: funcgraph_exit: # 1153.899 us | } kworker/1:2-592 [001] 85.946546: funcgraph_exit: # 1133.155 us | } kworker/7:1-176 [007] 85.947730: funcgraph_exit: # 1325.744 us | } kworker/3:1-72 [003] 85.948588: funcgraph_exit: # 1192.876 us | } kworker/4:2-529 [004] 85.950647: funcgraph_exit: # 2248.783 us | } kworker/6:1-150 [006] 85.951463: funcgraph_exit: # 1045.498 us | } kworker/0:2-127 [000] 85.952576: funcgraph_exit: # 1171.645 us | } kworker/1:2-592 [001] 85.953393: funcgraph_exit: # 1001.659 us | } kworker/5:2-352 [005] 85.955542: funcgraph_exit: # 1130.396 us | } It spreads the load out much nicer, and seldom goes over 2 milliseconds. My trace was only for a few seconds (no events lost), and I can see the max with: $ trace-cmd report trace-printk-nopatch-8cpus.dat | grep '#' | cut -d'#' -f1 | sort -n | tail -20 13510.063 us | } 13531.914 us | } 13533.591 us | } 13574.488 us | } 13584.322 us | } 13611.234 us | } 13668.255 us | } 13710.294 us | } 13722.017 us | } 13725.000 us | } 13728.883 us | } 13740.601 us | } 13744.194 us | } 13770.512 us | } 13776.246 us | } 13809.729 us | } 13812.279 us | } 13830.563 us | } 13907.382 us | } 14498.937 us | } We had a printk take up to 14 millisecond with a VGA console on 8 CPUs, where each CPU was doing a single printk once per millisecond. With my patch: $ trace-cmd report trace-printk-patch-8cpus.dat |grep '#' | cut -d'#' -f 2 |sort -n | tail -20 2477.627 us | } 2482.012 us | } 2482.077 us | } 2488.672 us | } 2490.253 us | } 2502.381 us | } 2503.990 us | } 2505.448 us | } 2509.389 us | } 2510.868 us | } 2511.597 us | } 2512.108 us | } 2538.886 us | } 3095.917 us | } 3137.604 us | } 3223.213 us | } 3324.967 us | } 3331.018 us | } 3331.518 us | } 3348.263 us | } We got up to just over 3 milliseconds for a single printk. I think that's a damn good improvement. -- Steve ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-12 2:55 ` Steven Rostedt 2018-01-12 4:20 ` Steven Rostedt @ 2018-01-16 19:44 ` Tejun Heo 2018-01-17 9:12 ` Petr Mladek 1 sibling, 1 reply; 140+ messages in thread From: Tejun Heo @ 2018-01-16 19:44 UTC (permalink / raw) To: Steven Rostedt Cc: Sergey Senozhatsky, Petr Mladek, Sergey Senozhatsky, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Pavel Machek, linux-kernel Hello, Steven. On Thu, Jan 11, 2018 at 09:55:47PM -0500, Steven Rostedt wrote: > All I did was start off a work queue on each CPU, and each CPU does one > printk() followed by a millisecond sleep. No 10,000 printks, nothing > in an interrupt handler. Preemption is disabled while the printk > happens, but that's normal. > > This is much closer to an OOM happening all over the system, where OOMs > stack dumps are occurring on different CPUS. OOMs can't happen all over the system. It can only happen on a single CPU at a time. If you're printing from multiple CPUs, your solution would work great. That is the situation your patches are designed to address to begin with. That isn't the problem that I reported tho. I understand that your solution works for that class of problems and that is great. I really wish that it could address the other class of problems too tho, and it doesn't seem like it would be that difficult to cover both cases, right? Thanks. -- tejun ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-16 19:44 ` Tejun Heo @ 2018-01-17 9:12 ` Petr Mladek 2018-01-17 15:15 ` Tejun Heo 0 siblings, 1 reply; 140+ messages in thread From: Petr Mladek @ 2018-01-17 9:12 UTC (permalink / raw) To: Tejun Heo Cc: Steven Rostedt, Sergey Senozhatsky, Sergey Senozhatsky, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Pavel Machek, linux-kernel On Tue 2018-01-16 11:44:56, Tejun Heo wrote: > Hello, Steven. > > On Thu, Jan 11, 2018 at 09:55:47PM -0500, Steven Rostedt wrote: > > All I did was start off a work queue on each CPU, and each CPU does one > > printk() followed by a millisecond sleep. No 10,000 printks, nothing > > in an interrupt handler. Preemption is disabled while the printk > > happens, but that's normal. > > > > This is much closer to an OOM happening all over the system, where OOMs > > stack dumps are occurring on different CPUS. > > OOMs can't happen all over the system. It can only happen on a single > CPU at a time. If you're printing from multiple CPUs, your solution > would work great. That is the situation your patches are designed to > address to begin with. That isn't the problem that I reported tho. I > understand that your solution works for that class of problems and > that is great. I really wish that it could address the other class of > problems too tho, and it doesn't seem like it would be that difficult > to cover both cases, right? IMHO, the bad scenario with OOM was that any printk() called in the OOM report became console_lock owner and was responsible for pushing all new messages to the console. There was a possible livelock because OOM Killer was blocked in console_unlock() while other CPUs repeatedly complained about failed allocations. Even the current patch should help. It allows to hand off the console_lock to another CPU and OOM killer could eventually continue. Of course, it is possible that it might not be enough. For example, there might still be too many messages to print when the memory is freed. Therefore there will be no more complains, no more hand offs and the last console_lock owner might still cause softlockup. But it still will be better than the livelockup. Of course, we will need to address the softlockup. But let's see how this works in practice. Best Regards, Petr ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-17 9:12 ` Petr Mladek @ 2018-01-17 15:15 ` Tejun Heo 2018-01-17 17:12 ` Steven Rostedt 0 siblings, 1 reply; 140+ messages in thread From: Tejun Heo @ 2018-01-17 15:15 UTC (permalink / raw) To: Petr Mladek Cc: Steven Rostedt, Sergey Senozhatsky, Sergey Senozhatsky, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Pavel Machek, linux-kernel Hello, On Wed, Jan 17, 2018 at 10:12:08AM +0100, Petr Mladek wrote: > IMHO, the bad scenario with OOM was that any printk() called in > the OOM report became console_lock owner and was responsible > for pushing all new messages to the console. There was a possible > livelock because OOM Killer was blocked in console_unlock() while > other CPUs repeatedly complained about failed allocations. I don't know why we're constantly back into this same loop on this topic but that's not the problem we've been seeing. There are no other CPUs involved. It's great that Steven's patches solve a good number of problems. It is also true that there's a class of problems that it doesn't solve, which other approaches do. The productive thing to do here is trying to solve the unsolved one too, especially given that it doesn't seem too difficuilt to do so on top of what's proposed. Thanks. -- tejun ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-17 15:15 ` Tejun Heo @ 2018-01-17 17:12 ` Steven Rostedt 2018-01-17 18:42 ` Steven Rostedt ` (2 more replies) 0 siblings, 3 replies; 140+ messages in thread From: Steven Rostedt @ 2018-01-17 17:12 UTC (permalink / raw) To: Tejun Heo Cc: Petr Mladek, Sergey Senozhatsky, Sergey Senozhatsky, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Pavel Machek, linux-kernel On Wed, 17 Jan 2018 07:15:09 -0800 Tejun Heo <tj@kernel.org> wrote: > It's great that Steven's patches solve a good number of problems. It > is also true that there's a class of problems that it doesn't solve, > which other approaches do. The productive thing to do here is trying > to solve the unsolved one too, especially given that it doesn't seem > too difficuilt to do so on top of what's proposed. OK, let's talk about the other problems, as this is no longer related to my patch. >From your previous email: > 1. Console is IPMI emulated serial console. Super slow. Also > netconsole is in use. > 2. System runs out of memory, OOM triggers. > 3. OOM handler is printing out OOM debug info. > 4. While trying to emit the messages for netconsole, the network stack > / driver tries to allocate memory and then fail, which in turn > triggers allocation failure or other warning messages. printk was > already flushing, so the messages are queued on the ring. > 5. OOM handler keeps flushing but 4 repeats and the queue is never > shrinking. Because OOM handler is trapped in printk flushing, it > never manages to free memory and no one else can enter OOM path > either, so the system is trapped in this state. >From what I gathered, you said an OOM would trigger, and then the network console would not be able to allocate memory and it would trigger a printk too, and cause an infinite amount of printks. This could very well be a great place to force offloading. If a printk is called from within a printk, at the same context (normal, softirq, irq or NMI), then we should trigger the offloading. My ftrace ring buffer has a context level recursion check, we could use that, and even tie it into my previous patch: With something like this (not compiled tested or anything, and kick_offload_thread() would need to be implemented). diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index 9cb943c90d98..b80b23a0ca13 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -2261,6 +2261,63 @@ static int have_callable_console(void) return 0; } +/* + * Used for which context the printk is in. + * NMI = 0 + * IRQ = 1 + * SOFTIRQ = 2 + * NORMAL = 3 + * + * Stack ordered, where the lower number can preempt + * the higher number: mask &= mask - 1, will only clear + * the lowerest set bit. + */ +enum { + CTX_NMI, + CTX_IRQ, + CTX_SOFTIRQ, + CTX_NORMAL, +}; + +static DEFINE_PER_CPU(int, recursion_bits); + +static bool recursion_check_start(void) +{ + unsigned long pc = preempt_count(); + int val = this_cpu_read(recursion_bits); + + if (!(pc & (NMI_MASK | HARDIRQ_MASK | SOFTIRQ_OFFSET))) + bit = CTX_NORMAL; + else + bit = pc & NMI_MASK ? CTX_NMI : + pc & HARDIRQ_MASK ? CTX_IRQ : CTX_SOFTIRQ; + + if (unlikely(val & (1 << bit))) + return true; + + val |= (1 << bit); + this_cpu_write(recursion_bits, val); + return false; +} + +static void recursion_check_finish(bool offload) +{ + int val = this_cpu_read(recursion_bits); + + if (offload) + return; + + val &= val - 1; + this_cpu_write(recursion_bits, val); +} + +static void kick_offload_thread(void) +{ + /* + * Consoles are triggering printks, offload the printks + * to another CPU to hopefully avoid a lockup. + */ +} /* * Can we actually use the console at this time on this cpu? @@ -2333,6 +2390,7 @@ void console_unlock(void) for (;;) { struct printk_log *msg; + bool offload; size_t ext_len = 0; size_t len; @@ -2393,15 +2451,20 @@ void console_unlock(void) * waiter waiting to take over. */ console_lock_spinning_enable(); + offload = recursion_check_start(); stop_critical_timings(); /* don't trace print latency */ call_console_drivers(ext_text, ext_len, text, len); start_critical_timings(); + recursion_check_finish(offload); + if (console_lock_spinning_disable_and_check()) { printk_safe_exit_irqrestore(flags); return; } + if (offload) + kick_offload_thread(); printk_safe_exit_irqrestore(flags); -- Steve ^ permalink raw reply related [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-17 17:12 ` Steven Rostedt @ 2018-01-17 18:42 ` Steven Rostedt 2018-01-19 18:20 ` Steven Rostedt 2018-01-17 20:05 ` Tejun Heo 2018-01-18 5:42 ` Sergey Senozhatsky 2 siblings, 1 reply; 140+ messages in thread From: Steven Rostedt @ 2018-01-17 18:42 UTC (permalink / raw) To: Tejun Heo Cc: Petr Mladek, Sergey Senozhatsky, Sergey Senozhatsky, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Pavel Machek, linux-kernel On Wed, 17 Jan 2018 12:12:51 -0500 Steven Rostedt <rostedt@goodmis.org> wrote: > @@ -2393,15 +2451,20 @@ void console_unlock(void) > * waiter waiting to take over. > */ > console_lock_spinning_enable(); > + offload = recursion_check_start(); > > stop_critical_timings(); /* don't trace print latency */ > call_console_drivers(ext_text, ext_len, text, len); > start_critical_timings(); > > + recursion_check_finish(offload); > + > if (console_lock_spinning_disable_and_check()) { > printk_safe_exit_irqrestore(flags); > return; > } > + if (offload) > + kick_offload_thread(); > Ah, major flaw in this code. The recursion check needs to be in printk() itself around the trylock. -- Steve diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index 9cb943c90d98..31df145cc4d7 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -1826,6 +1826,63 @@ static size_t log_output(int facility, int level, enum log_flags lflags, const c /* Store it in the record log */ return log_store(facility, level, lflags, 0, dict, dictlen, text, text_len); } +/* + * Used for which context the printk is in. + * NMI = 0 + * IRQ = 1 + * SOFTIRQ = 2 + * NORMAL = 3 + * + * Stack ordered, where the lower number can preempt + * the higher number: mask &= mask - 1, will only clear + * the lowerest set bit. + */ +enum { + CTX_NMI, + CTX_IRQ, + CTX_SOFTIRQ, + CTX_NORMAL, +}; + +static DEFINE_PER_CPU(int, recursion_bits); + +static bool recursion_check_start(void) +{ + unsigned long pc = preempt_count(); + int val = this_cpu_read(recursion_bits); + + if (!(pc & (NMI_MASK | HARDIRQ_MASK | SOFTIRQ_OFFSET))) + bit = CTX_NORMAL; + else + bit = pc & NMI_MASK ? CTX_NMI : + pc & HARDIRQ_MASK ? CTX_IRQ : CTX_SOFTIRQ; + + if (unlikely(val & (1 << bit))) + return true; + + val |= (1 << bit); + this_cpu_write(recursion_bits, val); + return false; +} + +static void recursion_check_finish(bool offload) +{ + int val = this_cpu_read(recursion_bits); + + if (offload) + return; + + val &= val - 1; + this_cpu_write(recursion_bits, val); +} + +static void kick_offload_thread(void) +{ + /* + * Consoles are triggering printks, offload the printks + * to another CPU to hopefully avoid a lockup. + */ +} asmlinkage int vprintk_emit(int facility, int level, const char *dict, size_t dictlen, @@ -1895,12 +1952,14 @@ asmlinkage int vprintk_emit(int facility, int level, /* If called from the scheduler, we can not call up(). */ if (!in_sched) { + bool offload; /* * Disable preemption to avoid being preempted while holding * console_sem which would prevent anyone from printing to * console */ preempt_disable(); + offload = recursion_check_start(); /* * Try to acquire and then immediately release the console * semaphore. The release will print out buffers and wake up @@ -1908,7 +1967,12 @@ asmlinkage int vprintk_emit(int facility, int level, */ if (console_trylock_spinning()) console_unlock(); + + recursion_check_finish(offload); preempt_enable(); + + if (offload) + kick_offload_thread(); } return printed_len; ^ permalink raw reply related [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-17 18:42 ` Steven Rostedt @ 2018-01-19 18:20 ` Steven Rostedt 2018-01-20 7:14 ` Sergey Senozhatsky 2018-01-20 12:19 ` Tejun Heo 0 siblings, 2 replies; 140+ messages in thread From: Steven Rostedt @ 2018-01-19 18:20 UTC (permalink / raw) To: Tejun Heo Cc: Petr Mladek, Sergey Senozhatsky, Sergey Senozhatsky, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Pavel Machek, linux-kernel Tejun, I was thinking about this a bit more, and instead of offloading a recursive printk, perhaps its best to simply throttle it. Because the problem may not go away if a printk thread takes over, because the bug is really the printk infrastructure filling the printk buffer keeping printk from ever stopping. This patch detects that printk is causing itself to print more and throttles it after 3 messages have printed due to recursion. Could you see if this helps your test cases? I built this on top of linux-next (yesterday's branch). It compiles and boots, but I didn't do any other tests on it. Thanks! -- Steve diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index 9cb943c90d98..2c7f18876224 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -1826,6 +1826,75 @@ static size_t log_output(int facility, int level, enum log_flags lflags, const c /* Store it in the record log */ return log_store(facility, level, lflags, 0, dict, dictlen, text, text_len); } +/* + * Used for which context the printk is in. + * NMI = 0 + * IRQ = 1 + * SOFTIRQ = 2 + * NORMAL = 3 + * + * Stack ordered, where the lower number can preempt + * the higher number: mask &= mask - 1, will only clear + * the lowerest set bit. + */ +enum { + CTX_NMI, + CTX_IRQ, + CTX_SOFTIRQ, + CTX_NORMAL, +}; + +static DEFINE_PER_CPU(int, recursion_bits); +static DEFINE_PER_CPU(int, recursion_count); +static atomic_t recursion_overflow; +static const int recursion_max = 3; + +static int __recursion_check_test(int val) +{ + unsigned long pc = preempt_count(); + int bit; + + if (!(pc & (NMI_MASK | HARDIRQ_MASK | SOFTIRQ_OFFSET))) + bit = CTX_NORMAL; + else + bit = pc & NMI_MASK ? CTX_NMI : + pc & HARDIRQ_MASK ? CTX_IRQ : CTX_SOFTIRQ; + + return val & (1 << bit); +} + +static bool recursion_check_test(void) +{ + int val = this_cpu_read(recursion_bits); + + return __recursion_check_test(val); +} + +static bool recursion_check_start(void) +{ + int val = this_cpu_read(recursion_bits); + int set; + + set = __recursion_check_test(val); + + if (unlikely(set)) + return true; + + val |= set; + this_cpu_write(recursion_bits, val); + return false; +} + +static void recursion_check_finish(bool recursion) +{ + int val = this_cpu_read(recursion_bits); + + if (recursion) + return; + + val &= val - 1; + this_cpu_write(recursion_bits, val); +} asmlinkage int vprintk_emit(int facility, int level, const char *dict, size_t dictlen, @@ -1849,6 +1918,17 @@ asmlinkage int vprintk_emit(int facility, int level, /* This stops the holder of console_sem just where we want him */ logbuf_lock_irqsave(flags); + + if (recursion_check_test()) { + /* A printk happened within a printk at the same context */ + if (this_cpu_inc_return(recursion_count) > recursion_max) { + atomic_inc(&recursion_overflow); + logbuf_unlock_irqrestore(flags); + printed_len = 0; + goto out; + } + } + /* * The printf needs to come first; we need the syslog * prefix which might be passed-in as a parameter. @@ -1895,12 +1975,14 @@ asmlinkage int vprintk_emit(int facility, int level, /* If called from the scheduler, we can not call up(). */ if (!in_sched) { + bool recursion; /* * Disable preemption to avoid being preempted while holding * console_sem which would prevent anyone from printing to * console */ preempt_disable(); + recursion = recursion_check_start(); /* * Try to acquire and then immediately release the console * semaphore. The release will print out buffers and wake up @@ -1908,9 +1990,12 @@ asmlinkage int vprintk_emit(int facility, int level, */ if (console_trylock_spinning()) console_unlock(); + + recursion_check_finish(recursion); + this_cpu_write(recursion_count, 0); preempt_enable(); } - +out: return printed_len; } EXPORT_SYMBOL(vprintk_emit); @@ -2343,9 +2428,14 @@ void console_unlock(void) seen_seq = log_next_seq; } - if (console_seq < log_first_seq) { + if (console_seq < log_first_seq || atomic_read(&recursion_overflow)) { + size_t missed; + + missed = atomic_xchg(&recursion_overflow, 0); + missed += log_first_seq - console_seq; + len = sprintf(text, "** %u printk messages dropped **\n", - (unsigned)(log_first_seq - console_seq)); + (unsigned)missed); /* messages are gone, move to first one */ console_seq = log_first_seq; ^ permalink raw reply related [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-19 18:20 ` Steven Rostedt @ 2018-01-20 7:14 ` Sergey Senozhatsky 2018-01-20 15:49 ` Steven Rostedt 2018-01-20 12:19 ` Tejun Heo 1 sibling, 1 reply; 140+ messages in thread From: Sergey Senozhatsky @ 2018-01-20 7:14 UTC (permalink / raw) To: Steven Rostedt Cc: Tejun Heo, Petr Mladek, Sergey Senozhatsky, Sergey Senozhatsky, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Pavel Machek, linux-kernel On (01/19/18 13:20), Steven Rostedt wrote: [..] > I was thinking about this a bit more, and instead of offloading a > recursive printk, perhaps its best to simply throttle it. Because the > problem may not go away if a printk thread takes over, because the bug > is really the printk infrastructure filling the printk buffer keeping > printk from ever stopping. right. I didn't quite got it how that would help. if we would kick_offload every time we add new printks after call_console_drivers(), then we can just end up in a kick_offload loop traveling across all CPUs. [..] > asmlinkage int vprintk_emit(int facility, int level, > const char *dict, size_t dictlen, > @@ -1849,6 +1918,17 @@ asmlinkage int vprintk_emit(int facility, int level, > > /* This stops the holder of console_sem just where we want him */ > logbuf_lock_irqsave(flags); > + > + if (recursion_check_test()) { > + /* A printk happened within a printk at the same context */ > + if (this_cpu_inc_return(recursion_count) > recursion_max) { > + atomic_inc(&recursion_overflow); > + logbuf_unlock_irqrestore(flags); > + printed_len = 0; > + goto out; > + } > + } didn't have time to look at this carefully, but is this possible? printks from console_unlock()->call_console_drivers() are redirected to printk_safe buffer. we need irq_work on that CPU to flush its printk_safe buffer. > EXPORT_SYMBOL(vprintk_emit); > @@ -2343,9 +2428,14 @@ void console_unlock(void) > seen_seq = log_next_seq; > } > > - if (console_seq < log_first_seq) { > + if (console_seq < log_first_seq || atomic_read(&recursion_overflow)) { > + size_t missed; > + > + missed = atomic_xchg(&recursion_overflow, 0); > + missed += log_first_seq - console_seq; > + > len = sprintf(text, "** %u printk messages dropped **\n", > - (unsigned)(log_first_seq - console_seq)); > + (unsigned)missed); > > /* messages are gone, move to first one */ > console_seq = log_first_seq; how are we going to distinguish between lockdep splats, for instance, or WARNs from call_console_drivers() -> foo_write(), which are valuable, and kmalloc() print outs, which might be less valuable? are we going to lose all of them now? then we can do a much simpler thing - steal one bit from `printk_context' and use if for a new PRINTK_NOOP_CONTEXT, which will be set around call_console_drivers(). vprintk_func() would redirect printks to vprintk_noop(fmt, args), which will do nothing. -ss ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-20 7:14 ` Sergey Senozhatsky @ 2018-01-20 15:49 ` Steven Rostedt 2018-01-21 14:15 ` Sergey Senozhatsky 0 siblings, 1 reply; 140+ messages in thread From: Steven Rostedt @ 2018-01-20 15:49 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Tejun Heo, Petr Mladek, Sergey Senozhatsky, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Pavel Machek, linux-kernel On Sat, 20 Jan 2018 16:14:02 +0900 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> wrote: > [..] > > asmlinkage int vprintk_emit(int facility, int level, > > const char *dict, size_t dictlen, > > @@ -1849,6 +1918,17 @@ asmlinkage int vprintk_emit(int facility, int level, > > > > /* This stops the holder of console_sem just where we want him */ > > logbuf_lock_irqsave(flags); > > + > > + if (recursion_check_test()) { > > + /* A printk happened within a printk at the same context */ > > + if (this_cpu_inc_return(recursion_count) > recursion_max) { > > + atomic_inc(&recursion_overflow); > > + logbuf_unlock_irqrestore(flags); > > + printed_len = 0; > > + goto out; > > + } > > + } > > didn't have time to look at this carefully, but is this possible? > > printks from console_unlock()->call_console_drivers() are redirected > to printk_safe buffer. we need irq_work on that CPU to flush its > printk_safe buffer. So is the issue that we keep triggering this irq work then? Then this solution does seem to be one that would work. Because after x amount of recursive printks (printk called by printk) it would just stop printing them, and end the irq work. Perhaps what Tejun is seeing is: printk() net_console() printk() --> redirected to irq work <irq work> printk net_console() printk() --> redirected to another irq work and so on and so on. This solution would need to be tweaked to add a timer to allow only so many nested printks in a given time. Otherwise it too would be an issue: printk() net_console() printk() -> redirected printk() -> throttled But the first x printk()s would still be redirected. and that x gets reset in this current patch at he end of the outermost printk. Perhaps it shouldn't reset x, or it can flush the printk safe buffer first. Is there a reason that console_unlock() doesn't flush the printk_safe_buffer? With a throttle number and flushing the printk_safe_buffer, that should solve the issue Tejun explained. > > how are we going to distinguish between lockdep splats, for instance, > or WARNs from call_console_drivers() -> foo_write(), which are valuable, > and kmalloc() print outs, which might be less valuable? are we going to The problem is that printk causing more printks is extremely dangerous, and ANY printk that is caused by a printk is of equal value, whether it is a console driver running out of memory or a lockdep splat. And the chances of having two hit at the same time is extremely low. > lose all of them now? then we can do a much simpler thing - steal one > bit from `printk_context' and use if for a new PRINTK_NOOP_CONTEXT, which > will be set around call_console_drivers(). vprintk_func() would redirect > printks to vprintk_noop(fmt, args), which will do nothing. Not sure what you mean here. Have some pseudo code to demonstrate with? -- Steve ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-20 15:49 ` Steven Rostedt @ 2018-01-21 14:15 ` Sergey Senozhatsky 2018-01-21 21:04 ` Steven Rostedt 2018-01-23 6:40 ` Sergey Senozhatsky 0 siblings, 2 replies; 140+ messages in thread From: Sergey Senozhatsky @ 2018-01-21 14:15 UTC (permalink / raw) To: Steven Rostedt Cc: Sergey Senozhatsky, Tejun Heo, Petr Mladek, Sergey Senozhatsky, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Pavel Machek, linux-kernel On (01/20/18 10:49), Steven Rostedt wrote: [..] > > printks from console_unlock()->call_console_drivers() are redirected > > to printk_safe buffer. we need irq_work on that CPU to flush its > > printk_safe buffer. > > So is the issue that we keep triggering this irq work then? Then this > solution does seem to be one that would work. Because after x amount of > recursive printks (printk called by printk) it would just stop printing > them, and end the irq work. > > Perhaps what Tejun is seeing is: > > printk() > net_console() > printk() --> redirected to irq work > > <irq work> > printk > net_console() > printk() --> redirected to another irq work > > and so on and so on. it's a bit trickier than that, I think. we have printk recursion from console drivers. it's redirected to printk_safe and we queue an IRQ work to flush the buffer printk console_unlock call_console_drivers net_console printk printk_save -> irq_work queue now console_unlock() enables local IRQs, we have the printk_safe flush. but printk_safe flush does not call into the console_unlock(), it uses printk_deferred() version of printk IRQ work prink_safe_flush printk_deferred -> irq_work queue so we schedule another IRQ work (deferred printk work), which eventually tries to lock console_sem IRQ work wake_up_klogd_work_func() if (console_trylock()) console_unlock() if it succeeds then it goes to console_unlock(), where console driver can cause another printk recursion. but, once again, it will be redirected to printk_safe buffer first. if it fails then we have either the original CPU to print out those irq_work messages, which is sort of bad, or another CPU which already acquired the console_sem and will print out. > This solution would need to be tweaked to add a timer to allow only so > many nested printks in a given time. Otherwise it too would be an issue: [..] > > how are we going to distinguish between lockdep splats, for instance, > > or WARNs from call_console_drivers() -> foo_write(), which are valuable, > > and kmalloc() print outs, which might be less valuable? are we going to > > The problem is that printk causing more printks is extremely dangerous, > and ANY printk that is caused by a printk is of equal value, whether it > is a console driver running out of memory or a lockdep splat. And > the chances of having two hit at the same time is extremely low. so.... fix the console drivers ;) just kidding. ok... the problem is that we flush printk_safe right when console_unlock() printing loop enables local IRQs via printk_safe_exit_irqrestore() [given that IRQs were enabled in the first place when the CPU went to console_unlock()]. this forces that CPU to loop in console_unlock() as long as we have printk-s coming from call_console_drivers(). but we probably can postpone printk_safe flush. basically, we can declare a new rule - we don't flush printk_safe buffer as long as console_sem is locked. because this is how that printing CPU stuck in the console_unlock() printing loop. printk_safe buffer is very important when it comes to storing a non-repetitive stuff, like a lockdep splat, which is a single shot event. but the more repetitive the message is, like millions of similar kmalloc() dump_stack()-s over and over again, the less value in it. we should have printk_safe buffer big enough for important info, like a lockdep splat, but millions of similar kmalloc() messages are pretty invaluable - one is already enough, we can drop the rest. and we should not flush new messages while there is a CPU looping in console_unlock(), because it already has messages to print, which were log_store()-ed the normal way. this is where the "postpone thing" jumps in. so how do we postpone printk_safe flush. we can't console_trylock()/console_unlock() in printk_safe flush code. but there is a `console_locked' flag and is_console_locked() function which tell us if the console_sem is locked. as long as we are in console_unlock() printing loop that flag is set, even if we enabled local IRQs and printk_safe flush work arrived. so now printk_safe flush does extra check and does not flush printk_safe buffer content as long as someone is currently printing or soon will start printing. but we need to take extra step and to re-queue flush on CPUs that did postpone it [console_unlock() can reschedule]. so now we flush only when printing CPU printed all pending logbuf messages, hit the "console_seq == log_next_seq" and up() console_sem. this sets a boundary -- no matter how many times during the current printing loop we called console drivers and how many times those drivers caused printk recursion, we will flush only SAFE_LOG_BUF_LEN chars. IOW, what we have now, looks like this: a) printk_safe is for important stuff, we don't guarantee that a flood of messages will be preserved. b) we extend the previously existing "will flush messages later on from a safer context" and now we also consider console_unlock() printing loop as unsafe context. so the unsafe context it's not only the one that can deadlock, but also the one that can lockup CPU in a printing loop because of recursive printk messages. so this printk console_unlock { for (;;) { call_console_drivers net_console printk printk_save -> irq_work queue IRQ work prink_safe_flush printk_deferred -> log_store() iret } up(); } // which can never break out, because we can always append new messages // from prink_safe_flush. becomes this printk console_unlock { for (;;) { call_console_drivers net_console printk printk_save -> irq_work queue } up(); IRQ work prink_safe_flush printk_deferred -> log_store() iret } something completely untested, sketchy and ugly. --- kernel/printk/internal.h | 2 ++ kernel/printk/printk.c | 1 + kernel/printk/printk_safe.c | 37 +++++++++++++++++++++++++++++++++++-- 3 files changed, 38 insertions(+), 2 deletions(-) diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h index 2a7d04049af4..e85517818a49 100644 --- a/kernel/printk/internal.h +++ b/kernel/printk/internal.h @@ -30,6 +30,8 @@ __printf(1, 0) int vprintk_func(const char *fmt, va_list args); void __printk_safe_enter(void); void __printk_safe_exit(void); +void printk_safe_requeue_flushing(void); + #define printk_safe_enter_irqsave(flags) \ do { \ local_irq_save(flags); \ diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index 9cb943c90d98..7aca23e8d7b2 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -2428,6 +2428,7 @@ void console_unlock(void) raw_spin_lock(&logbuf_lock); retry = console_seq != log_next_seq; raw_spin_unlock(&logbuf_lock); + printk_safe_requeue_flushing(); printk_safe_exit_irqrestore(flags); if (retry && console_trylock()) diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c index 3e3c2004bb23..45d5b292d7e1 100644 --- a/kernel/printk/printk_safe.c +++ b/kernel/printk/printk_safe.c @@ -22,6 +22,7 @@ #include <linux/cpumask.h> #include <linux/irq_work.h> #include <linux/printk.h> +#include <linux/console.h> #include "internal.h" @@ -51,6 +52,7 @@ struct printk_safe_seq_buf { atomic_t message_lost; struct irq_work work; /* IRQ work that flushes the buffer */ unsigned char buffer[SAFE_LOG_BUF_LEN]; + bool need_requeue; }; static DEFINE_PER_CPU(struct printk_safe_seq_buf, safe_print_seq); @@ -196,6 +198,7 @@ static void __printk_safe_flush(struct irq_work *work) size_t len; int i; + s->need_requeue = false; /* * The lock has two functions. First, one reader has to flush all * available message to make the lockless synchronization with @@ -243,6 +246,36 @@ static void __printk_safe_flush(struct irq_work *work) raw_spin_unlock_irqrestore(&read_lock, flags); } +/* NMI buffers are always flushed */ +static void flush_nmi_buffer(struct irq_work *work) +{ + __printk_safe_flush(work); +} + +/* printk_safe buffers flushing, on the contrary, can be postponed */ +static void flush_printk_safe_buffer(struct irq_work *work) +{ + struct printk_safe_seq_buf *s = + container_of(work, struct printk_safe_seq_buf, work); + + if (is_console_locked()) { + s->need_requeue = true; + return; + } + + __printk_safe_flush(work); +} + +void printk_safe_requeue_flushing(void) +{ + int cpu; + + for_each_possible_cpu(cpu) { + if (per_cpu(safe_print_seq, cpu).need_requeue) + queue_flush_work(&per_cpu(safe_print_seq, cpu)); + } +} + /** * printk_safe_flush - flush all per-cpu nmi buffers. * @@ -387,11 +420,11 @@ void __init printk_safe_init(void) struct printk_safe_seq_buf *s; s = &per_cpu(safe_print_seq, cpu); - init_irq_work(&s->work, __printk_safe_flush); + init_irq_work(&s->work, flush_printk_safe_buffer); #ifdef CONFIG_PRINTK_NMI s = &per_cpu(nmi_print_seq, cpu); - init_irq_work(&s->work, __printk_safe_flush); + init_irq_work(&s->work, flush_nmi_buffer); #endif } --- > > lose all of them now? then we can do a much simpler thing - steal one > > bit from `printk_context' and use if for a new PRINTK_NOOP_CONTEXT, which > > will be set around call_console_drivers(). vprintk_func() would redirect > > printks to vprintk_noop(fmt, args), which will do nothing. > > Not sure what you mean here. Have some pseudo code to demonstrate with? sure, I meant that if we want to disable printk recursion from call_console_drivers(), then we can add another printk_safe section, say printk_noop_begin()/printk_noop_end(), which would set a PRINTK_NOOP bit of `printk_context', so when we have printk() under PRINTK_NOOP then vprintk_func() goes to a special vprintk_noop(fmt, args), which simply drops the message [does not store any in the per-cpu printk safe buffer, so we don't flush it and don't add new messages to the logbuf]. and we annotate call_console_drivers() as a pintk_noop function. but that a no-brainer and I'd prefer to have another solution. -ss ^ permalink raw reply related [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-21 14:15 ` Sergey Senozhatsky @ 2018-01-21 21:04 ` Steven Rostedt 2018-01-22 8:56 ` Sergey Senozhatsky 2018-01-23 6:40 ` Sergey Senozhatsky 1 sibling, 1 reply; 140+ messages in thread From: Steven Rostedt @ 2018-01-21 21:04 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Sergey Senozhatsky, Tejun Heo, Petr Mladek, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Pavel Machek, linux-kernel On Sun, 21 Jan 2018 23:15:21 +0900 Sergey Senozhatsky <sergey.senozhatsky@gmail.com> wrote: > so.... fix the console drivers ;) Totally agree! > > > > > just kidding. ok... Darn it! ;-) > the problem is that we flush printk_safe right when console_unlock() printing > loop enables local IRQs via printk_safe_exit_irqrestore() [given that IRQs > were enabled in the first place when the CPU went to console_unlock()]. > this forces that CPU to loop in console_unlock() as long as we have > printk-s coming from call_console_drivers(). but we probably can postpone > printk_safe flush. basically, we can declare a new rule - we don't flush > printk_safe buffer as long as console_sem is locked. because this is how > that printing CPU stuck in the console_unlock() printing loop. printk_safe > buffer is very important when it comes to storing a non-repetitive stuff, like > a lockdep splat, which is a single shot event. but the more repetitive the > message is, like millions of similar kmalloc() dump_stack()-s over and over > again, the less value in it. we should have printk_safe buffer big enough for > important info, like a lockdep splat, but millions of similar kmalloc() > messages are pretty invaluable - one is already enough, we can drop the rest. > and we should not flush new messages while there is a CPU looping in > console_unlock(), because it already has messages to print, which were > log_store()-ed the normal way. The above is really hard to read without any capitalization. Everything seems to be a run-on sentence and gives me a head ache. So you lost me there. > > this is where the "postpone thing" jumps in. so how do we postpone printk_safe > flush. > > we can't console_trylock()/console_unlock() in printk_safe flush code. > but there is a `console_locked' flag and is_console_locked() function which > tell us if the console_sem is locked. as long as we are in console_unlock() > printing loop that flag is set, even if we enabled local IRQs and printk_safe > flush work arrived. so now printk_safe flush does extra check and does > not flush printk_safe buffer content as long as someone is currently > printing or soon will start printing. but we need to take extra step and > to re-queue flush on CPUs that did postpone it [console_unlock() can > reschedule]. so now we flush only when printing CPU printed all pending > logbuf messages, hit the "console_seq == log_next_seq" and up() > console_sem. this sets a boundary -- no matter how many times during the > current printing loop we called console drivers and how many times those > drivers caused printk recursion, we will flush only SAFE_LOG_BUF_LEN chars. Another big paragraph with no capitals (besides macros and CPU ;-) I guess this is what it is like when people listen to me talk too fast. > > > IOW, what we have now, looks like this: > > a) printk_safe is for important stuff, we don't guarantee that a flood > of messages will be preserved. > > b) we extend the previously existing "will flush messages later on from > a safer context" and now we also consider console_unlock() printing loop > as unsafe context. so the unsafe context it's not only the one that can > deadlock, but also the one that can lockup CPU in a printing loop because > of recursive printk messages. Sure. > > > so this > > printk > console_unlock > { > for (;;) { > call_console_drivers > net_console > printk > printk_save -> irq_work queue > > IRQ work > prink_safe_flush > printk_deferred -> log_store() > iret > } > up(); > } > > > // which can never break out, because we can always append new messages > // from prink_safe_flush. > > becomes this > > printk > console_unlock > { > for (;;) { > call_console_drivers > net_console > printk > printk_save -> irq_work queue > > } > up(); > > IRQ work > prink_safe_flush > printk_deferred -> log_store() > iret > } But we do eventually send this data out to the consoles, and if the consoles cause more printks, wouldn't this still never end? > > > > something completely untested, sketchy and ugly. > > --- > > kernel/printk/internal.h | 2 ++ > kernel/printk/printk.c | 1 + > kernel/printk/printk_safe.c | 37 +++++++++++++++++++++++++++++++++++-- > 3 files changed, 38 insertions(+), 2 deletions(-) > > diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h > index 2a7d04049af4..e85517818a49 100644 > --- a/kernel/printk/internal.h > +++ b/kernel/printk/internal.h > @@ -30,6 +30,8 @@ __printf(1, 0) int vprintk_func(const char *fmt, va_list args); > void __printk_safe_enter(void); > void __printk_safe_exit(void); > > +void printk_safe_requeue_flushing(void); > + > #define printk_safe_enter_irqsave(flags) \ > do { \ > local_irq_save(flags); \ > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c > index 9cb943c90d98..7aca23e8d7b2 100644 > --- a/kernel/printk/printk.c > +++ b/kernel/printk/printk.c > @@ -2428,6 +2428,7 @@ void console_unlock(void) > raw_spin_lock(&logbuf_lock); > retry = console_seq != log_next_seq; > raw_spin_unlock(&logbuf_lock); > + printk_safe_requeue_flushing(); > printk_safe_exit_irqrestore(flags); > > if (retry && console_trylock()) > diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c > index 3e3c2004bb23..45d5b292d7e1 100644 > --- a/kernel/printk/printk_safe.c > +++ b/kernel/printk/printk_safe.c > @@ -22,6 +22,7 @@ > #include <linux/cpumask.h> > #include <linux/irq_work.h> > #include <linux/printk.h> > +#include <linux/console.h> > > #include "internal.h" > > @@ -51,6 +52,7 @@ struct printk_safe_seq_buf { > atomic_t message_lost; > struct irq_work work; /* IRQ work that flushes the buffer */ > unsigned char buffer[SAFE_LOG_BUF_LEN]; > + bool need_requeue; > }; > > static DEFINE_PER_CPU(struct printk_safe_seq_buf, safe_print_seq); > @@ -196,6 +198,7 @@ static void __printk_safe_flush(struct irq_work *work) > size_t len; > int i; > > + s->need_requeue = false; > /* > * The lock has two functions. First, one reader has to flush all > * available message to make the lockless synchronization with > @@ -243,6 +246,36 @@ static void __printk_safe_flush(struct irq_work *work) > raw_spin_unlock_irqrestore(&read_lock, flags); > } > > +/* NMI buffers are always flushed */ > +static void flush_nmi_buffer(struct irq_work *work) > +{ > + __printk_safe_flush(work); > +} > + > +/* printk_safe buffers flushing, on the contrary, can be postponed */ > +static void flush_printk_safe_buffer(struct irq_work *work) > +{ > + struct printk_safe_seq_buf *s = > + container_of(work, struct printk_safe_seq_buf, work); > + > + if (is_console_locked()) { > + s->need_requeue = true; > + return; > + } > + > + __printk_safe_flush(work); > +} > + > +void printk_safe_requeue_flushing(void) > +{ > + int cpu; > + > + for_each_possible_cpu(cpu) { > + if (per_cpu(safe_print_seq, cpu).need_requeue) > + queue_flush_work(&per_cpu(safe_print_seq, cpu)); > + } > +} > + > /** > * printk_safe_flush - flush all per-cpu nmi buffers. > * > @@ -387,11 +420,11 @@ void __init printk_safe_init(void) > struct printk_safe_seq_buf *s; > > s = &per_cpu(safe_print_seq, cpu); > - init_irq_work(&s->work, __printk_safe_flush); > + init_irq_work(&s->work, flush_printk_safe_buffer); > > #ifdef CONFIG_PRINTK_NMI > s = &per_cpu(nmi_print_seq, cpu); > - init_irq_work(&s->work, __printk_safe_flush); > + init_irq_work(&s->work, flush_nmi_buffer); > #endif > } > > --- > > > > > > lose all of them now? then we can do a much simpler thing - steal one > > > bit from `printk_context' and use if for a new PRINTK_NOOP_CONTEXT, which > > > will be set around call_console_drivers(). vprintk_func() would redirect > > > printks to vprintk_noop(fmt, args), which will do nothing. > > > > Not sure what you mean here. Have some pseudo code to demonstrate with? > > sure, I meant that if we want to disable printk recursion from > call_console_drivers(), then we can add another printk_safe section, say > printk_noop_begin()/printk_noop_end(), which would set a PRINTK_NOOP > bit of `printk_context', so when we have printk() under PRINTK_NOOP > then vprintk_func() goes to a special vprintk_noop(fmt, args), which > simply drops the message [does not store any in the per-cpu printk > safe buffer, so we don't flush it and don't add new messages to the > logbuf]. and we annotate call_console_drivers() as a pintk_noop > function. but that a no-brainer and I'd prefer to have another solution. > Another big paragraph without caps, but I figured it out. I say we try that solution and see if it fixes the current issues. Because right now, the bug I see Tejun presented was if something in printk causes printks, it will start a printk bomb and lock up the system. The only reasonable answer I see to that is to throttle printk in such a case. -- Steve ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-21 21:04 ` Steven Rostedt @ 2018-01-22 8:56 ` Sergey Senozhatsky 2018-01-22 10:28 ` Sergey Senozhatsky 0 siblings, 1 reply; 140+ messages in thread From: Sergey Senozhatsky @ 2018-01-22 8:56 UTC (permalink / raw) To: Steven Rostedt Cc: Sergey Senozhatsky, Sergey Senozhatsky, Tejun Heo, Petr Mladek, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Pavel Machek, linux-kernel On (01/21/18 16:04), Steven Rostedt wrote: [..] > > The problem is that we flush printk_safe right when console_unlock() printing > > loop enables local IRQs via printk_safe_exit_irqrestore() [given that IRQs > > were enabled in the first place when the CPU went to console_unlock()]. > > This forces that CPU to loop in console_unlock() as long as we have > > printk-s coming from call_console_drivers(). But we probably can postpone > > printk_safe flush. Basically, we can declare a new rule - we don't flush > > printk_safe buffer as long as console_sem is locked. Because this is how > > that printing CPU stuck in the console_unlock() printing loop. printk_safe > > buffer is very important when it comes to storing a non-repetitive stuff, like > > a lockdep splat, which is a single shot event. But the more repetitive the > > message is, like millions of similar kmalloc() dump_stack()-s over and over > > again, the less value in it. We should have printk_safe buffer big enough for > > important info, like a lockdep splat, but millions of similar kmalloc() > > messages are pretty invaluable - one is already enough, we can drop the rest. > > And we should not flush new messages while there is a CPU looping in > > console_unlock(), because it already has messages to print, which were > > log_store()-ed the normal way. > > The above is really hard to read without any capitalization. Everything > seems to be a run-on sentence and gives me a head ache. So you lost me > there. Apologies. Will improve. > > This is where the "postpone thing" jumps in. so how do we postpone printk_safe > > flush. > > > > We can't console_trylock()/console_unlock() in printk_safe flush code. > > But there is a `console_locked' flag and is_console_locked() function which > > tell us if the console_sem is locked. As long as we are in console_unlock() > > printing loop that flag is set, even if we enabled local IRQs and printk_safe > > flush work arrived. So now printk_safe flush does extra check and does > > not flush printk_safe buffer content as long as someone is currently > > printing or soon will start printing. But we need to take extra step and > > to re-queue flush on CPUs that did postpone it [console_unlock() can > > reschedule]. So now we flush only when printing CPU printed all pending > > logbuf messages, hit the "console_seq == log_next_seq" and up() > > console_sem. This sets a boundary -- no matter how many times during the > > current printing loop we called console drivers and how many times those > > drivers caused printk recursion, we will flush only SAFE_LOG_BUF_LEN chars. > > Another big paragraph with no capitals (besides macros and CPU ;-) I walked through it and mostly "fixed" your head ache :) > I guess this is what it is like when people listen to me talk too fast. Absolutely!!! > > IOW, what we have now, looks like this: > > > > a) printk_safe is for important stuff, we don't guarantee that a flood > > of messages will be preserved. > > > > b) we extend the previously existing "will flush messages later on from > > a safer context" and now we also consider console_unlock() printing loop > > as unsafe context. so the unsafe context it's not only the one that can > > deadlock, but also the one that can lockup CPU in a printing loop because > > of recursive printk messages. > > Sure. > > > > > > > so this > > > > printk > > console_unlock > > { > > for (;;) { > > call_console_drivers > > net_console > > printk > > printk_save -> irq_work queue > > > > IRQ work > > prink_safe_flush > > printk_deferred -> log_store() > > iret > > } > > up(); > > } > > > > > > // which can never break out, because we can always append new messages > > // from prink_safe_flush. > > > > becomes this > > > > printk > > console_unlock > > { > > for (;;) { > > call_console_drivers > > net_console > > printk > > printk_save -> irq_work queue > > > > } > > up(); > > > > IRQ work > > prink_safe_flush > > printk_deferred -> log_store() > > iret > > } > > But we do eventually send this data out to the consoles, and if the > consoles cause more printks, wouldn't this still never end? Right. But not immediately. We wait for all pending messages to be evicted first (and up()) and we limit the amount of data that we flush. So at least it's not exponential anymore: every line that we print does not log_store() a whole new dump_stack() of lines. Which is still miles away from "a perfect solution", tho. But limiting the number of lines we print recursive is not much better. First, we don't know how many lines we want to flush from printk_safe. And having a knob indicates that no one ever will do it right. Second, hand off can play games with it. Assume the following, - I set `recursion_max' to 200. Which looks reasonable to me. Then I have the following ping-pong: CPU0 CPU1 printk() recursion_check_start() call_console_drivers() printk() recursion_check_start() dump_stack() console_trylock_spinning() flush_printk_safe() spinning_disable_and_check() //handoff recursion_check_finish() // reset call_console_drivers() dump_stack() flush_printk_safe() printk() recursion_check_start() console_trylock_spinning() spinning_disable_and_check() // handoff recursion_check_finish() // reset call_console_drivers() printk dump_stack() recursion_check_start() flush_printk_safe() console_trylock_spinning() spinning_disable_and_check() recursion_check_finish() // reset call_console_drivers() ... And so on. So it's - take the lock, call console drivers, fill up the printk_safe buffer, flush it completely, hand off printing to another CPU, reset this CPU's recursion counter, repeat everything again. Every line of dump_stack() which we print adds another dump_stack() lines. Sergey "no-time-for-capitals" Senozhatsky ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-22 8:56 ` Sergey Senozhatsky @ 2018-01-22 10:28 ` Sergey Senozhatsky 2018-01-22 10:36 ` Sergey Senozhatsky 0 siblings, 1 reply; 140+ messages in thread From: Sergey Senozhatsky @ 2018-01-22 10:28 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Steven Rostedt, Sergey Senozhatsky, Tejun Heo, Petr Mladek, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Pavel Machek, linux-kernel On (01/22/18 17:56), Sergey Senozhatsky wrote: [..] > Assume the following, But more importantly we are missing another huge thing - console_unlock(). Suppose: console_lock(); << preemption >> printk printk .. printk console_unlock() for (;;) { call_console_drivers() dump_stack queue IRQ work IRQ work >> flush_printk_safe printk_deferred() ... printk_deferred() << iret } This should explode: sleepable console_unlock() may reschedule, printk_safe flush bypasses recursion checks. -ss ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-22 10:28 ` Sergey Senozhatsky @ 2018-01-22 10:36 ` Sergey Senozhatsky 0 siblings, 0 replies; 140+ messages in thread From: Sergey Senozhatsky @ 2018-01-22 10:36 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Steven Rostedt, Sergey Senozhatsky, Tejun Heo, Petr Mladek, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Pavel Machek, linux-kernel On (01/22/18 19:28), Sergey Senozhatsky wrote: > On (01/22/18 17:56), Sergey Senozhatsky wrote: > [..] > > Assume the following, > > But more importantly we are missing another huge thing - console_unlock(). IOW, not every console_unlock() is from vprintk_emit(). We can have console_trylock() -> console_unlock() being from non-preemptible context, etc. And then irq work to flush printk_safe -> printk_deferred all the time. -ss ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-21 14:15 ` Sergey Senozhatsky 2018-01-21 21:04 ` Steven Rostedt @ 2018-01-23 6:40 ` Sergey Senozhatsky 2018-01-23 7:05 ` Sergey Senozhatsky ` (2 more replies) 1 sibling, 3 replies; 140+ messages in thread From: Sergey Senozhatsky @ 2018-01-23 6:40 UTC (permalink / raw) To: Petr Mladek, Tejun Heo, Steven Rostedt Cc: Sergey Senozhatsky, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Pavel Machek, linux-kernel, Sergey Senozhatsky Hello, On (01/21/18 23:15), Sergey Senozhatsky wrote: [..] > we have printk recursion from console drivers. it's redirected to > printk_safe and we queue an IRQ work to flush the buffer > > printk > console_unlock > call_console_drivers > net_console > printk > printk_save -> irq_work queue > > now console_unlock() enables local IRQs, we have the printk_safe > flush. but printk_safe flush does not call into the console_unlock(), > it uses printk_deferred() version of printk > > IRQ work > > prink_safe_flush > printk_deferred -> irq_work queue > > > so we schedule another IRQ work (deferred printk work), which eventually > tries to lock console_sem > > IRQ work > wake_up_klogd_work_func() > if (console_trylock()) > console_unlock() Why do we even use irq_work for printk_safe? Okay... So, how about this. For printk_safe we use system_wq for flushing. IOW, we flush from a task running exactly on the same CPU which hit printk recursion, not from IRQ. From vprintk_safe() recursion, we queue work on *that* CPU. Which gives us the following thing: if CPU stuck in console_unlock() loop with preemption disabled, then system_wq does not schedule on that CPU and we, thus, don't flush printk_safe buffer from that CPU. But if CPU can reschedule, then we are kinda OK to flush printk_safe buffer, printing extra messages from that CPU will not lock it up, because it's in preemptible context. Thoughts? Something like this: From: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Subject: [PATCH] printk/safe: use slowpath flush for printk_safe --- kernel/printk/printk_safe.c | 53 ++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 48 insertions(+), 5 deletions(-) diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c index 3e3c2004bb23..c641853a5fa9 100644 --- a/kernel/printk/printk_safe.c +++ b/kernel/printk/printk_safe.c @@ -22,6 +22,8 @@ #include <linux/cpumask.h> #include <linux/irq_work.h> #include <linux/printk.h> +#include <linux/console.h> +#include <linux/workqueue.h> #include "internal.h" @@ -50,6 +52,7 @@ struct printk_safe_seq_buf { atomic_t len; /* length of written data */ atomic_t message_lost; struct irq_work work; /* IRQ work that flushes the buffer */ + struct work_struct slowpath_flush_work; unsigned char buffer[SAFE_LOG_BUF_LEN]; }; @@ -61,12 +64,20 @@ static DEFINE_PER_CPU(struct printk_safe_seq_buf, nmi_print_seq); #endif /* Get flushed in a more safe context. */ -static void queue_flush_work(struct printk_safe_seq_buf *s) +static void queue_irq_flush_work(struct printk_safe_seq_buf *s) { if (printk_safe_irq_ready) irq_work_queue(&s->work); } +static void queue_slowpath_flush_work(struct printk_safe_seq_buf *s) +{ + if (printk_safe_irq_ready) + queue_work_on(smp_processor_id(), + system_wq, + &s->slowpath_flush_work); +} + /* * Add a message to per-CPU context-dependent buffer. NMI and printk-safe * have dedicated buffers, because otherwise printk-safe preempted by @@ -89,7 +100,7 @@ static __printf(2, 0) int printk_safe_log_store(struct printk_safe_seq_buf *s, /* The trailing '\0' is not counted into len. */ if (len >= sizeof(s->buffer) - 1) { atomic_inc(&s->message_lost); - queue_flush_work(s); + queue_irq_flush_work(s); return 0; } @@ -112,7 +123,6 @@ static __printf(2, 0) int printk_safe_log_store(struct printk_safe_seq_buf *s, if (atomic_cmpxchg(&s->len, len, len + add) != len) goto again; - queue_flush_work(s); return add; } @@ -243,6 +253,35 @@ static void __printk_safe_flush(struct irq_work *work) raw_spin_unlock_irqrestore(&read_lock, flags); } +/* NMI buffers are always flushed */ +static void flush_nmi_buffer(struct irq_work *work) +{ + __printk_safe_flush(work); +} + +/* printk_safe buffers flushing, on the contrary, can be postponed */ +static void flush_printk_safe_buffer(struct irq_work *work) +{ + struct printk_safe_seq_buf *s = + container_of(work, struct printk_safe_seq_buf, work); + + if (is_console_locked()) { + queue_slowpath_flush_work(s); + return; + } + + __printk_safe_flush(work); +} + +static void slowpath_flush_work_fn(struct work_struct *work) +{ + struct printk_safe_seq_buf *s = + container_of(work, struct printk_safe_seq_buf, + slowpath_flush_work); + + __printk_safe_flush(&s->work); +} + /** * printk_safe_flush - flush all per-cpu nmi buffers. * @@ -300,6 +339,7 @@ static __printf(1, 0) int vprintk_nmi(const char *fmt, va_list args) { struct printk_safe_seq_buf *s = this_cpu_ptr(&nmi_print_seq); + queue_irq_flush_work(s); return printk_safe_log_store(s, fmt, args); } @@ -343,6 +383,7 @@ static __printf(1, 0) int vprintk_safe(const char *fmt, va_list args) { struct printk_safe_seq_buf *s = this_cpu_ptr(&safe_print_seq); + queue_slowpath_flush_work(s); return printk_safe_log_store(s, fmt, args); } @@ -387,11 +428,13 @@ void __init printk_safe_init(void) struct printk_safe_seq_buf *s; s = &per_cpu(safe_print_seq, cpu); - init_irq_work(&s->work, __printk_safe_flush); + init_irq_work(&s->work, flush_printk_safe_buffer); + INIT_WORK(&s->slowpath_flush_work, slowpath_flush_work_fn); #ifdef CONFIG_PRINTK_NMI s = &per_cpu(nmi_print_seq, cpu); - init_irq_work(&s->work, __printk_safe_flush); + init_irq_work(&s->work, flush_nmi_buffer); + /* we don't use slowpath flush for NMI */ #endif } -- 2.16.1 ^ permalink raw reply related [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-23 6:40 ` Sergey Senozhatsky @ 2018-01-23 7:05 ` Sergey Senozhatsky 2018-01-23 7:31 ` Sergey Senozhatsky 2018-01-23 14:56 ` Steven Rostedt 2 siblings, 0 replies; 140+ messages in thread From: Sergey Senozhatsky @ 2018-01-23 7:05 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Petr Mladek, Tejun Heo, Steven Rostedt, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Pavel Machek, linux-kernel, Sergey Senozhatsky On (01/23/18 15:40), Sergey Senozhatsky wrote: [..] > Why do we even use irq_work for printk_safe? > > Okay... So, how about this. For printk_safe we use system_wq for flushing. > IOW, we flush from a task running exactly on the same CPU which hit printk > recursion, not from IRQ. From vprintk_safe() recursion, we queue work on > *that* CPU. Which gives us the following thing: if CPU stuck in > console_unlock() loop with preemption disabled, then system_wq does not > schedule on that CPU and we, thus, don't flush printk_safe buffer from that > CPU. But if CPU can reschedule, then we are kinda OK to flush printk_safe > buffer, printing extra messages from that CPU will not lock it up, because > it's in preemptible context. > > Thoughts? A slightly reworked version: a) Do not check console_locked b) Do not have irq_work fast path for printk_safe buffer c) Which lets to union WQ/IRQ work structs - we use only IRQ work for NMI buffers, and only WQ work for SAFE buffers d) And also to refactor the code From: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Subject: [PATCH] printk/safe: use system_wq to flush printk_safe buffers --- kernel/printk/printk_safe.c | 52 ++++++++++++++++++++++++++++++++++----------- 1 file changed, 40 insertions(+), 12 deletions(-) diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c index 3e3c2004bb23..6c8c82cedccb 100644 --- a/kernel/printk/printk_safe.c +++ b/kernel/printk/printk_safe.c @@ -22,6 +22,7 @@ #include <linux/cpumask.h> #include <linux/irq_work.h> #include <linux/printk.h> +#include <linux/workqueue.h> #include "internal.h" @@ -49,7 +50,12 @@ static int printk_safe_irq_ready __read_mostly; struct printk_safe_seq_buf { atomic_t len; /* length of written data */ atomic_t message_lost; - struct irq_work work; /* IRQ work that flushes the buffer */ + union { + /* IRQ work that flushes NMI buffer */ + struct irq_work irq_flush_work; + /* WQ work that flushes SAFE buffer */ + struct work_struct wq_flush_work; + }; unsigned char buffer[SAFE_LOG_BUF_LEN]; }; @@ -61,10 +67,18 @@ static DEFINE_PER_CPU(struct printk_safe_seq_buf, nmi_print_seq); #endif /* Get flushed in a more safe context. */ -static void queue_flush_work(struct printk_safe_seq_buf *s) +static void queue_irq_flush_work(struct printk_safe_seq_buf *s) { if (printk_safe_irq_ready) - irq_work_queue(&s->work); + irq_work_queue(&s->irq_flush_work); +} + +static void queue_wq_flush_work(struct printk_safe_seq_buf *s) +{ + if (printk_safe_irq_ready) + queue_work_on(smp_processor_id(), + system_wq, + &s->wq_flush_work); } /* @@ -89,7 +103,6 @@ static __printf(2, 0) int printk_safe_log_store(struct printk_safe_seq_buf *s, /* The trailing '\0' is not counted into len. */ if (len >= sizeof(s->buffer) - 1) { atomic_inc(&s->message_lost); - queue_flush_work(s); return 0; } @@ -112,7 +125,6 @@ static __printf(2, 0) int printk_safe_log_store(struct printk_safe_seq_buf *s, if (atomic_cmpxchg(&s->len, len, len + add) != len) goto again; - queue_flush_work(s); return add; } @@ -186,12 +198,10 @@ static void report_message_lost(struct printk_safe_seq_buf *s) * Flush data from the associated per-CPU buffer. The function * can be called either via IRQ work or independently. */ -static void __printk_safe_flush(struct irq_work *work) +static void __printk_safe_flush(struct printk_safe_seq_buf *s) { static raw_spinlock_t read_lock = __RAW_SPIN_LOCK_INITIALIZER(read_lock); - struct printk_safe_seq_buf *s = - container_of(work, struct printk_safe_seq_buf, work); unsigned long flags; size_t len; int i; @@ -243,6 +253,22 @@ static void __printk_safe_flush(struct irq_work *work) raw_spin_unlock_irqrestore(&read_lock, flags); } +static void irq_flush_work_fn(struct irq_work *work) +{ + struct printk_safe_seq_buf *s = + container_of(work, struct printk_safe_seq_buf, irq_flush_work); + + __printk_safe_flush(s); +} + +static void wq_flush_work_fn(struct work_struct *work) +{ + struct printk_safe_seq_buf *s = + container_of(work, struct printk_safe_seq_buf, wq_flush_work); + + __printk_safe_flush(s); +} + /** * printk_safe_flush - flush all per-cpu nmi buffers. * @@ -256,9 +282,9 @@ void printk_safe_flush(void) for_each_possible_cpu(cpu) { #ifdef CONFIG_PRINTK_NMI - __printk_safe_flush(&per_cpu(nmi_print_seq, cpu).work); + __printk_safe_flush(this_cpu_ptr(&nmi_print_seq)); #endif - __printk_safe_flush(&per_cpu(safe_print_seq, cpu).work); + __printk_safe_flush(this_cpu_ptr(&safe_print_seq)); } } @@ -300,6 +326,7 @@ static __printf(1, 0) int vprintk_nmi(const char *fmt, va_list args) { struct printk_safe_seq_buf *s = this_cpu_ptr(&nmi_print_seq); + queue_irq_flush_work(s); return printk_safe_log_store(s, fmt, args); } @@ -343,6 +370,7 @@ static __printf(1, 0) int vprintk_safe(const char *fmt, va_list args) { struct printk_safe_seq_buf *s = this_cpu_ptr(&safe_print_seq); + queue_wq_flush_work(s); return printk_safe_log_store(s, fmt, args); } @@ -387,11 +415,11 @@ void __init printk_safe_init(void) struct printk_safe_seq_buf *s; s = &per_cpu(safe_print_seq, cpu); - init_irq_work(&s->work, __printk_safe_flush); + INIT_WORK(&s->wq_flush_work, wq_flush_work_fn); #ifdef CONFIG_PRINTK_NMI s = &per_cpu(nmi_print_seq, cpu); - init_irq_work(&s->work, __printk_safe_flush); + init_irq_work(&s->irq_flush_work, irq_flush_work_fn); #endif } -- 2.16.1 ^ permalink raw reply related [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-23 6:40 ` Sergey Senozhatsky 2018-01-23 7:05 ` Sergey Senozhatsky @ 2018-01-23 7:31 ` Sergey Senozhatsky 2018-01-23 14:56 ` Steven Rostedt 2 siblings, 0 replies; 140+ messages in thread From: Sergey Senozhatsky @ 2018-01-23 7:31 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Petr Mladek, Tejun Heo, Steven Rostedt, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Pavel Machek, linux-kernel, Sergey Senozhatsky On (01/23/18 15:40), Sergey Senozhatsky wrote: > > Why do we even use irq_work for printk_safe? > ... perhaps because of wq: pool->lock -> printk -> call_console_drivers -> printk -> vprintk_safe -> wq: pool->lock Which is a "many things have gone wrong" type of scenario. Maybe we can workaround it somehow, hm. Tejun, can we have lockless WQ? ;) -ss ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-23 6:40 ` Sergey Senozhatsky 2018-01-23 7:05 ` Sergey Senozhatsky 2018-01-23 7:31 ` Sergey Senozhatsky @ 2018-01-23 14:56 ` Steven Rostedt 2018-01-23 15:21 ` Sergey Senozhatsky 2 siblings, 1 reply; 140+ messages in thread From: Steven Rostedt @ 2018-01-23 14:56 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Petr Mladek, Tejun Heo, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Pavel Machek, linux-kernel, Sergey Senozhatsky On Tue, 23 Jan 2018 15:40:23 +0900 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> wrote: > Why do we even use irq_work for printk_safe? Why not? Really, I think you are trying to solve a symptom and not the problem. If we are having issues with irq_work, we are going to have issues with a work queue. It's just spreading out the problem instead of fixing it. -- Steve ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-23 14:56 ` Steven Rostedt @ 2018-01-23 15:21 ` Sergey Senozhatsky 2018-01-23 15:41 ` Steven Rostedt 0 siblings, 1 reply; 140+ messages in thread From: Sergey Senozhatsky @ 2018-01-23 15:21 UTC (permalink / raw) To: Steven Rostedt Cc: Sergey Senozhatsky, Petr Mladek, Tejun Heo, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Pavel Machek, linux-kernel, Sergey Senozhatsky On (01/23/18 09:56), Steven Rostedt wrote: [..] > > Why do we even use irq_work for printk_safe? > > Why not? > > Really, I think you are trying to solve a symptom and not the problem. > If we are having issues with irq_work, we are going to have issues with > a work queue. It's just spreading out the problem instead of fixing it. I don't want to have heuristics in print_safe, I don't want to have a magic number controlled by a user-space visible knob, I don't want to have the first 3 lines of a lockdep splat. The problem is - we flush printk_safe too soon and printing CPU ends up in a lockup - it log_store()-s new messages while it's printing the pending ones. It's fine to do so when CPU is in preemptible context. Really, we should not care in printk_safe as long as we don't lockup the kernel. The misbehaving console must be fixed. If CPU is not in preemptible context then we do lockup the kernel. Because we flush printk_safe regardless of the current CPU context. If we will flush printk_safe via WQ then we automatically add this "OK! The CPU is preemptible, we can log_store(), it's totally OK, we will not lockup it up." thing. Yes, we fill up the logbuf with probably needed and appreciated or unneeded messages. But we should not care in printk_safe. We don't lockup the kernel... And the misbehaving console must be fixed. I disagree with "If we are having issues with irq_work, we are going to have issues with a work queue". There is a tremendous difference between irq_work on that CPU and queue_work_on(smp_proessor_id()). One does not care about CPU context, the other one does. -ss ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-23 15:21 ` Sergey Senozhatsky @ 2018-01-23 15:41 ` Steven Rostedt 2018-01-23 15:43 ` Tejun Heo 2018-01-23 16:01 ` Sergey Senozhatsky 0 siblings, 2 replies; 140+ messages in thread From: Steven Rostedt @ 2018-01-23 15:41 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Sergey Senozhatsky, Petr Mladek, Tejun Heo, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Pavel Machek, linux-kernel On Wed, 24 Jan 2018 00:21:30 +0900 Sergey Senozhatsky <sergey.senozhatsky@gmail.com> wrote: > On (01/23/18 09:56), Steven Rostedt wrote: > [..] > > > Why do we even use irq_work for printk_safe? > > > > Why not? > > > > Really, I think you are trying to solve a symptom and not the problem. > > If we are having issues with irq_work, we are going to have issues with > > a work queue. It's just spreading out the problem instead of fixing it. > > I don't want to have heuristics in print_safe, I don't want to have a magic > number controlled by a user-space visible knob, I don't want to have the > first 3 lines of a lockdep splat. We can have more. But if printk is causing printks, that's a major bug. And work queues are not going to fix it, it will just spread out the pain. Have it be 100 printks, it needs to be fixed if it is happening. And having all printks just generate more printks is not helpful. Even if we slow them down. They will still never end. A printk causing a printk is a special case, and we need to just show enough to let the user know that its happening, and why printks are being throttled. Yes, we may lose data, but if every printk that goes out causes another printk, then there's going to be so much noise that we wont know what other things went wrong. Honestly, if someone showed me a report where the logs were filled with printks that caused printks, I'd stop right there and tell them that needs to be fixed before we do anything else. And if that recursion is happening because of another problem, I don't want to see the recursion printks. I want to see the printks that show what is causing the recursions. > The problem is - we flush printk_safe too soon and printing CPU ends up > in a lockup - it log_store()-s new messages while it's printing the pending No, the problem is that printks are causing more printks. Yes that will make flushing them soon more likely to lock up the system. But that's not the problem. The problem is printks causing printks. > ones. It's fine to do so when CPU is in preemptible context. Really, we > should not care in printk_safe as long as we don't lockup the kernel. The > misbehaving console must be fixed. If CPU is not in preemptible context then > we do lockup the kernel. Because we flush printk_safe regardless of the > current CPU context. If we will flush printk_safe via WQ then we automatically And if we can throttle recursive printks, then we should be able to stop that from happening. > add this "OK! The CPU is preemptible, we can log_store(), it's totally OK, we > will not lockup it up." thing. Yes, we fill up the logbuf with probably needed > and appreciated or unneeded messages. But we should not care in printk_safe. > We don't lockup the kernel... And the misbehaving console must be fixed. I agree. > > I disagree with "If we are having issues with irq_work, we are going to have > issues with a work queue". There is a tremendous difference between irq_work > on that CPU and queue_work_on(smp_proessor_id()). One does not care about CPU > context, the other one does. But switching to work queue does not address the underlining problem that printks are causing more printks. -- Steve ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-23 15:41 ` Steven Rostedt @ 2018-01-23 15:43 ` Tejun Heo 2018-01-23 16:12 ` Sergey Senozhatsky ` (2 more replies) 2018-01-23 16:01 ` Sergey Senozhatsky 1 sibling, 3 replies; 140+ messages in thread From: Tejun Heo @ 2018-01-23 15:43 UTC (permalink / raw) To: Steven Rostedt Cc: Sergey Senozhatsky, Sergey Senozhatsky, Petr Mladek, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Pavel Machek, linux-kernel Hello, Steven. On Tue, Jan 23, 2018 at 10:41:21AM -0500, Steven Rostedt wrote: > > I don't want to have heuristics in print_safe, I don't want to have a magic > > number controlled by a user-space visible knob, I don't want to have the > > first 3 lines of a lockdep splat. > > We can have more. But if printk is causing printks, that's a major bug. > And work queues are not going to fix it, it will just spread out the > pain. Have it be 100 printks, it needs to be fixed if it is happening. > And having all printks just generate more printks is not helpful. Even > if we slow them down. They will still never end. So, at least in the case that we were seeing, it isn't that black and white. printk keeps causing printks but only because printk buffer flushing is preventing the printk'ing context from making forward progress. The key problem there is that a flushing context may get pinned flushing indefinitely and using a separate context does solve the problem. Thanks. -- tejun ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-23 15:43 ` Tejun Heo @ 2018-01-23 16:12 ` Sergey Senozhatsky 2018-01-23 16:13 ` Steven Rostedt 2018-04-23 5:35 ` Sergey Senozhatsky 2 siblings, 0 replies; 140+ messages in thread From: Sergey Senozhatsky @ 2018-01-23 16:12 UTC (permalink / raw) To: Tejun Heo Cc: Steven Rostedt, Sergey Senozhatsky, Sergey Senozhatsky, Petr Mladek, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Pavel Machek, linux-kernel Hello, Tejun On (01/23/18 07:43), Tejun Heo wrote: > Hello, Steven. > > On Tue, Jan 23, 2018 at 10:41:21AM -0500, Steven Rostedt wrote: > > > I don't want to have heuristics in print_safe, I don't want to have a magic > > > number controlled by a user-space visible knob, I don't want to have the > > > first 3 lines of a lockdep splat. > > > > We can have more. But if printk is causing printks, that's a major bug. > > And work queues are not going to fix it, it will just spread out the > > pain. Have it be 100 printks, it needs to be fixed if it is happening. > > And having all printks just generate more printks is not helpful. Even > > if we slow them down. They will still never end. > > So, at least in the case that we were seeing, it isn't that black and > white. printk keeps causing printks but only because printk buffer > flushing is preventing the printk'ing context from making forward > progress. The key problem there is that a flushing context may get > pinned flushing indefinitely and using a separate context does solve > the problem. Would you, as the original bug reporter, be OK if we flush printk_safe (only printk_safe, not printk_nmi for the time being) via WQ? This should move that "uncontrolled" flush to a safe context. I don't think we can easily add kthread offloading to printk at the moment (this will result in a massive gun fight). Just in case, below is something like a patch. I think I worked around the possible wq deadlock scenario. But I haven't tested the patch yet. It's a bit late here and I guess I need some rest. Will try to look more at it tomorrow. From: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Subject: [PATCH] printk/safe: split flush works --- kernel/printk/printk_safe.c | 75 +++++++++++++++++++++++++++++++++++++-------- 1 file changed, 63 insertions(+), 12 deletions(-) diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c index 3e3c2004bb23..54bc40ce3c34 100644 --- a/kernel/printk/printk_safe.c +++ b/kernel/printk/printk_safe.c @@ -22,6 +22,7 @@ #include <linux/cpumask.h> #include <linux/irq_work.h> #include <linux/printk.h> +#include <linux/workqueue.h> #include "internal.h" @@ -49,7 +50,10 @@ static int printk_safe_irq_ready __read_mostly; struct printk_safe_seq_buf { atomic_t len; /* length of written data */ atomic_t message_lost; - struct irq_work work; /* IRQ work that flushes the buffer */ + /* IRQ work that flushes NMI buffer */ + struct irq_work irq_flush_work; + /* WQ work that flushes SAFE buffer */ + struct work_struct wq_flush_work; unsigned char buffer[SAFE_LOG_BUF_LEN]; }; @@ -61,10 +65,18 @@ static DEFINE_PER_CPU(struct printk_safe_seq_buf, nmi_print_seq); #endif /* Get flushed in a more safe context. */ -static void queue_flush_work(struct printk_safe_seq_buf *s) +static void queue_irq_flush_work(struct printk_safe_seq_buf *s) { if (printk_safe_irq_ready) - irq_work_queue(&s->work); + irq_work_queue(&s->irq_flush_work); +} + +static void queue_wq_flush_work(struct printk_safe_seq_buf *s) +{ + if (printk_safe_irq_ready) + queue_work_on(smp_processor_id(), + system_wq, + &s->wq_flush_work); } /* @@ -89,7 +101,7 @@ static __printf(2, 0) int printk_safe_log_store(struct printk_safe_seq_buf *s, /* The trailing '\0' is not counted into len. */ if (len >= sizeof(s->buffer) - 1) { atomic_inc(&s->message_lost); - queue_flush_work(s); + queue_irq_flush_work(s); return 0; } @@ -112,7 +124,7 @@ static __printf(2, 0) int printk_safe_log_store(struct printk_safe_seq_buf *s, if (atomic_cmpxchg(&s->len, len, len + add) != len) goto again; - queue_flush_work(s); + queue_irq_flush_work(s); return add; } @@ -186,12 +198,10 @@ static void report_message_lost(struct printk_safe_seq_buf *s) * Flush data from the associated per-CPU buffer. The function * can be called either via IRQ work or independently. */ -static void __printk_safe_flush(struct irq_work *work) +static void __printk_safe_flush(struct printk_safe_seq_buf *s) { static raw_spinlock_t read_lock = __RAW_SPIN_LOCK_INITIALIZER(read_lock); - struct printk_safe_seq_buf *s = - container_of(work, struct printk_safe_seq_buf, work); unsigned long flags; size_t len; int i; @@ -243,6 +253,46 @@ static void __printk_safe_flush(struct irq_work *work) raw_spin_unlock_irqrestore(&read_lock, flags); } +static void irq_flush_work_fn(struct irq_work *work) +{ + struct printk_safe_seq_buf *s = + container_of(work, struct printk_safe_seq_buf, irq_flush_work); + + __printk_safe_flush(s); +} + +/* + * We can't queue wq work directly from vprintk_safe(), because we can + * deadlock. For instance: + * + * queue_work() + * spin_lock(pool->lock) + * printk() + * call_console_drivers() + * vprintk_safe() + * queue_work() + * spin_lock(pool->lock) + * + * So we use irq_work, from which we queue wq work. WQ disables local IRQs + * while it works with pool, so if we have irq_work on that CPU then we can + * expect that pool->lock is not locked. + */ +static void irq_to_wq_flush_work_fn(struct irq_work *work) +{ + struct printk_safe_seq_buf *s = + container_of(work, struct printk_safe_seq_buf, irq_flush_work); + + queue_wq_flush_work(s); +} + +static void wq_flush_work_fn(struct work_struct *work) +{ + struct printk_safe_seq_buf *s = + container_of(work, struct printk_safe_seq_buf, wq_flush_work); + + __printk_safe_flush(s); +} + /** * printk_safe_flush - flush all per-cpu nmi buffers. * @@ -256,9 +306,9 @@ void printk_safe_flush(void) for_each_possible_cpu(cpu) { #ifdef CONFIG_PRINTK_NMI - __printk_safe_flush(&per_cpu(nmi_print_seq, cpu).work); + __printk_safe_flush(this_cpu_ptr(&nmi_print_seq)); #endif - __printk_safe_flush(&per_cpu(safe_print_seq, cpu).work); + __printk_safe_flush(this_cpu_ptr(&safe_print_seq)); } } @@ -387,11 +437,12 @@ void __init printk_safe_init(void) struct printk_safe_seq_buf *s; s = &per_cpu(safe_print_seq, cpu); - init_irq_work(&s->work, __printk_safe_flush); + init_irq_work(&s->irq_flush_work, irq_to_wq_flush_work_fn); + INIT_WORK(&s->wq_flush_work, wq_flush_work_fn); #ifdef CONFIG_PRINTK_NMI s = &per_cpu(nmi_print_seq, cpu); - init_irq_work(&s->work, __printk_safe_flush); + init_irq_work(&s->irq_flush_work, irq_flush_work_fn); #endif } -- 2.16.1 ^ permalink raw reply related [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-23 15:43 ` Tejun Heo 2018-01-23 16:12 ` Sergey Senozhatsky @ 2018-01-23 16:13 ` Steven Rostedt 2018-01-23 17:21 ` Tejun Heo 2018-04-23 5:35 ` Sergey Senozhatsky 2 siblings, 1 reply; 140+ messages in thread From: Steven Rostedt @ 2018-01-23 16:13 UTC (permalink / raw) To: Tejun Heo Cc: Sergey Senozhatsky, Sergey Senozhatsky, Petr Mladek, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Pavel Machek, linux-kernel On Tue, 23 Jan 2018 07:43:47 -0800 Tejun Heo <tj@kernel.org> wrote: > So, at least in the case that we were seeing, it isn't that black and > white. printk keeps causing printks but only because printk buffer > flushing is preventing the printk'ing context from making forward > progress. The key problem there is that a flushing context may get > pinned flushing indefinitely and using a separate context does solve > the problem. > Does it? >From what I understand is that there's an issue with one of the printk consoles, due to memory pressure or whatnot. Then a printk happens within a printk recursively. It gets put into the safe buffer and an irq is sent to printk this printk. The issue you are saying is that when the printk enables interrupts, the irq work triggers and loads the log buffer with the safe buffer, and then the printk sees the new data added and continues to print, and hence never leaves this printk. Your solution is to delay the flushing of the safe buffer to another thread (work queue), which I also have issues with, because you break the "get printks out ASAP mantra". Then the work queue comes in and flushes the printks. And since the printks cause printks, we continue to spam the machine, but hey, we are making forward progress. Again, this is treating the symptom and not solving the problem. I really hate delaying printks to another thread, unless we can guarantee that that thread is ready to go immediately (basically spinning on a run queue waiting to print). Because if the system is having issues (which is the main reason for printks to happen), there's no guarantee that a work queue or another thread will ever schedule, and the safe printk buffer never gets out to the consoles. I much rather have throttling when recursive printks are detected. Make it a 100 lines to print if you want, but then throttle. Because once you have 100 lines or so, you will know that printks are causing printks, and you don't give a crap about the repeated process. Allow one flushing of the printk safe buffers, and then if it happens again, throttle it. Both methods can lose important data. I believe the throttling of recursive printks, after 100 prints or whatever, will be the least likely to lose important data, because printks caused by printks will just keep repeating the same data, and we don't care about repeats. But delaying the flushing could very well lose important data that caused a lockup. -- Steve ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-23 16:13 ` Steven Rostedt @ 2018-01-23 17:21 ` Tejun Heo 0 siblings, 0 replies; 140+ messages in thread From: Tejun Heo @ 2018-01-23 17:21 UTC (permalink / raw) To: Steven Rostedt Cc: Sergey Senozhatsky, Sergey Senozhatsky, Petr Mladek, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Pavel Machek, linux-kernel Hey, On Tue, Jan 23, 2018 at 11:13:30AM -0500, Steven Rostedt wrote: > From what I understand is that there's an issue with one of the printk > consoles, due to memory pressure or whatnot. Then a printk happens > within a printk recursively. It gets put into the safe buffer and an > irq is sent to printk this printk. > > The issue you are saying is that when the printk enables interrupts, > the irq work triggers and loads the log buffer with the safe buffer, and > then the printk sees the new data added and continues to print, and > hence never leaves this printk. I'm not sure it's irq or the same calling context, but yeah whatever it may be, it keeps adding new data. > Your solution is to delay the flushing of the safe buffer to another > thread (work queue), which I also have issues with, because you break > the "get printks out ASAP mantra". Then the work queue comes in and > flushes the printks. And since the printks cause printks, we continue > to spam the machine, but hey, we are making forward progress. I'm not sure "get printks out ASAP mantra" is the overriding concern after spending 20s flushing in an unknown context. I'm honestly curious. Would that still matter that much at that point? I went through the recent common crashes in the fleet earlier today and a good number of them are printk taking too long unnecessarily escalating the situation (most commonly triggering NMI watchdog). I'm not saying that this should override other concerns but it seems clear to me that we're pretty badly exposed on this front. > Again, this is treating the symptom and not solving the problem. Or adding a safety net when things go south, but this isn't what I was trying to argue. I mostly thought your understanding of what I reported wasn't accurate and wanted to clear that up. > I really hate delaying printks to another thread, unless we can > guarantee that that thread is ready to go immediately (basically > spinning on a run queue waiting to print). Because if the system is > having issues (which is the main reason for printks to happen), there's > no guarantee that a work queue or another thread will ever schedule, > and the safe printk buffer never gets out to the consoles. > > I much rather have throttling when recursive printks are detected. > Make it a 100 lines to print if you want, but then throttle. Because > once you have 100 lines or so, you will know that printks are causing > printks, and you don't give a crap about the repeated process. Allow > one flushing of the printk safe buffers, and then if it happens again, > throttle it. > > Both methods can lose important data. I believe the throttling of > recursive printks, after 100 prints or whatever, will be the least > likely to lose important data, because printks caused by printks will > just keep repeating the same data, and we don't care about repeats. But > delaying the flushing could very well lose important data that caused > a lockup. Hmmm... what you're suggesting still seems more fragile - ie. when does that 100 count get reset? OOM prints quite a few lines and if we're resetting on each line, that two order explosion of messages can still be really really bad. And issues like that seem to suggest that the root problem to handle here is avoiding locking up a context in flushing for too long. Your approach is trying to avoid causing that but it's a symptom which can be reached in many different ways. Thanks. -- tejun ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-23 15:43 ` Tejun Heo 2018-01-23 16:12 ` Sergey Senozhatsky 2018-01-23 16:13 ` Steven Rostedt @ 2018-04-23 5:35 ` Sergey Senozhatsky 2 siblings, 0 replies; 140+ messages in thread From: Sergey Senozhatsky @ 2018-04-23 5:35 UTC (permalink / raw) To: Tejun Heo Cc: Steven Rostedt, Sergey Senozhatsky, Sergey Senozhatsky, Petr Mladek, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Pavel Machek, linux-kernel On (01/23/18 07:43), Tejun Heo wrote: > > > > We can have more. But if printk is causing printks, that's a major bug. > > And work queues are not going to fix it, it will just spread out the > > pain. Have it be 100 printks, it needs to be fixed if it is happening. > > And having all printks just generate more printks is not helpful. Even > > if we slow them down. They will still never end. > > So, at least in the case that we were seeing, it isn't that black and > white. printk keeps causing printks but only because printk buffer > flushing is preventing the printk'ing context from making forward > progress. The key problem there is that a flushing context may get > pinned flushing indefinitely and using a separate context does solve > the problem. Hello Tejun, I'm willing to take a look at those printk()-s from console drivers. Any chance you can send me some of the backtraces you see [the most common/disturbing]? -ss ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-23 15:41 ` Steven Rostedt 2018-01-23 15:43 ` Tejun Heo @ 2018-01-23 16:01 ` Sergey Senozhatsky 2018-01-23 16:24 ` Steven Rostedt 2018-01-23 17:22 ` Tejun Heo 1 sibling, 2 replies; 140+ messages in thread From: Sergey Senozhatsky @ 2018-01-23 16:01 UTC (permalink / raw) To: Steven Rostedt Cc: Sergey Senozhatsky, Sergey Senozhatsky, Petr Mladek, Tejun Heo, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Pavel Machek, linux-kernel On (01/23/18 10:41), Steven Rostedt wrote: [..] > We can have more. But if printk is causing printks, that's a major bug. > And work queues are not going to fix it, it will just spread out the > pain. Have it be 100 printks, it needs to be fixed if it is happening. > And having all printks just generate more printks is not helpful. Even > if we slow them down. They will still never end. Dropping the messages is not the solution either. The original bug report report was - this "locks up my kernel". That's it. That's all people asked us to solve. With WQ we don't lockup the kernel, because we flush printk_safe in preemptible context. And people are very much expected to fix the misbehaving consoles. But that should not be printk_safe problem. > A printk causing a printk is a special case, and we need to just show > enough to let the user know that its happening, and why printks are > being throttled. Yes, we may lose data, but if every printk that goes > out causes another printk, then there's going to be so much noise that > we wont know what other things went wrong. Honestly, if someone showed > me a report where the logs were filled with printks that caused > printks, I'd stop right there and tell them that needs to be fixed > before we do anything else. And if that recursion is happening because > of another problem, I don't want to see the recursion printks. I want > to see the printks that show what is causing the recursions. I'll re-read this one tomorrow. Not quite following it. > > The problem is - we flush printk_safe too soon and printing CPU ends up > > in a lockup - it log_store()-s new messages while it's printing the pending > > No, the problem is that printks are causing more printks. Yes that will > make flushing them soon more likely to lock up the system. But that's > not the problem. The problem is printks causing printks. Yes. And ignoring those printk()-s by simply dropping them does not fix the problem by any means. > > ones. It's fine to do so when CPU is in preemptible context. Really, we > > should not care in printk_safe as long as we don't lockup the kernel. The > > misbehaving console must be fixed. If CPU is not in preemptible context then > > we do lockup the kernel. Because we flush printk_safe regardless of the > > current CPU context. If we will flush printk_safe via WQ then we automatically > > And if we can throttle recursive printks, then we should be able to > stop that from happening. pintk_safe was designed to be recursive. It was never designed to be used to troubleshoot or debug consoles. But it was designed to be recursive - because that's the sort of the problems it was meant to handle: recursive printks that would otherwise deadlock us. That's why we have it in the first place. > > add this "OK! The CPU is preemptible, we can log_store(), it's totally OK, we > > will not lockup it up." thing. Yes, we fill up the logbuf with probably needed > > and appreciated or unneeded messages. But we should not care in printk_safe. > > We don't lockup the kernel... And the misbehaving console must be fixed. > > I agree. Good. > > I disagree with "If we are having issues with irq_work, we are going to have > > issues with a work queue". There is a tremendous difference between irq_work > > on that CPU and queue_work_on(smp_proessor_id()). One does not care about CPU > > context, the other one does. > > But switching to work queue does not address the underlining problem > that printks are causing more printks. The only way to address those problems is to fix the console. That's the only. But that's not what I'm doing with my proposal. I fix the lockup scenario, the only reported problem so far. Whilst also keeping printk_safe around. -ss ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-23 16:01 ` Sergey Senozhatsky @ 2018-01-23 16:24 ` Steven Rostedt 2018-01-24 2:11 ` Sergey Senozhatsky 2018-01-23 17:22 ` Tejun Heo 1 sibling, 1 reply; 140+ messages in thread From: Steven Rostedt @ 2018-01-23 16:24 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Sergey Senozhatsky, Petr Mladek, Tejun Heo, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Pavel Machek, linux-kernel On Wed, 24 Jan 2018 01:01:53 +0900 Sergey Senozhatsky <sergey.senozhatsky@gmail.com> wrote: > On (01/23/18 10:41), Steven Rostedt wrote: > [..] > > We can have more. But if printk is causing printks, that's a major bug. > > And work queues are not going to fix it, it will just spread out the > > pain. Have it be 100 printks, it needs to be fixed if it is happening. > > And having all printks just generate more printks is not helpful. Even > > if we slow them down. They will still never end. > > Dropping the messages is not the solution either. The original bug report > report was - this "locks up my kernel". That's it. That's all people asked > us to solve. And throttling the printks would stop the lock up too. > > With WQ we don't lockup the kernel, because we flush printk_safe in > preemptible context. And people are very much expected to fix the > misbehaving consoles. But that should not be printk_safe problem. Right, but now you just made printk safe unreliable to get information out, because you need to wait for a schedule to occur, and if there's issues, like a deadlock, that thread will never run. And you just lost you lockdep splat. > > > A printk causing a printk is a special case, and we need to just show > > enough to let the user know that its happening, and why printks are > > being throttled. Yes, we may lose data, but if every printk that goes > > out causes another printk, then there's going to be so much noise that > > we wont know what other things went wrong. Honestly, if someone showed > > me a report where the logs were filled with printks that caused > > printks, I'd stop right there and tell them that needs to be fixed > > before we do anything else. And if that recursion is happening because > > of another problem, I don't want to see the recursion printks. I want > > to see the printks that show what is causing the recursions. > > I'll re-read this one tomorrow. Not quite following it. I'll add more capitals next time ;-) > > > > The problem is - we flush printk_safe too soon and printing CPU ends up > > > in a lockup - it log_store()-s new messages while it's printing the pending > > > > No, the problem is that printks are causing more printks. Yes that will > > make flushing them soon more likely to lock up the system. But that's > > not the problem. The problem is printks causing printks. > > Yes. And ignoring those printk()-s by simply dropping them does not fix > the problem by any means. How so? If we drop them, then the stuck printk has nothing to print and will move forward. I say once you start dropping printks due to recursion, keep dropping them. For at least a second, to allow them to stop killing the machine. > > > > ones. It's fine to do so when CPU is in preemptible context. Really, we > > > should not care in printk_safe as long as we don't lockup the kernel. The > > > misbehaving console must be fixed. If CPU is not in preemptible context then > > > we do lockup the kernel. Because we flush printk_safe regardless of the > > > current CPU context. If we will flush printk_safe via WQ then we automatically > > > > And if we can throttle recursive printks, then we should be able to > > stop that from happening. > > pintk_safe was designed to be recursive. It was never designed to be > used to troubleshoot or debug consoles. But it was designed to be > recursive - because that's the sort of the problems it was meant to > handle: recursive printks that would otherwise deadlock us. That's why > we have it in the first place. So printk safe is only triggered when at the same context? If we can guarantee that printk safe is triggered only when its because a printk is happening at the same context (not because of an interrupt, but really at the same context, using my context check), then I'm fine with delaying them to a work queue. That is, if we have this: printk() console_lock() <interrupt> printk() add to log buffer <print irq printk too> console_unlock(); printk() console_lock() <console does a printk> put in printk safe buffer trigger work queue console_unlock() <work queue> flush safe buffer printk() Then I'm fine with that. I have to look at the latest code. If this is indeed what we have, then I admit I misunderstood the problem you want to solve. I only want recursive printks (those that are actually triggered by doing a printk) to be allowed to be delayed. Make sense? -- Steve ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-23 16:24 ` Steven Rostedt @ 2018-01-24 2:11 ` Sergey Senozhatsky 2018-01-24 2:52 ` Steven Rostedt 0 siblings, 1 reply; 140+ messages in thread From: Sergey Senozhatsky @ 2018-01-24 2:11 UTC (permalink / raw) To: Steven Rostedt Cc: Sergey Senozhatsky, Sergey Senozhatsky, Petr Mladek, Tejun Heo, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Pavel Machek, linux-kernel Hello, On (01/23/18 11:24), Steven Rostedt wrote: [..] > > With WQ we don't lockup the kernel, because we flush printk_safe in > > preemptible context. And people are very much expected to fix the > > misbehaving consoles. But that should not be printk_safe problem. > > Right, but now you just made printk safe unreliable to get information > out, because you need to wait for a schedule to occur, and if there's > issues, like a deadlock, that thread will never run. And you just lost > you lockdep splat. Yes and No. printk_safe and printk_nmi are unreliable - both need irq_work. That's why we forcibly flush those buffers in panic(). At least for printk_safe case, and I'm pretty sure the same stands for printk_nmi, we never said that we will store all the messages that were printed from unsafe context (recursion or NMI). The only thing we said - we will try not to deadlock the system. Now we are adding one more thing to printk_safe - we will also try not to lockup the system. Default printk_safe buffer size might not be enough to store a very large lockdep splat. And we will report that the buffer is too small and that we lost some of the lines: "here is what we have, we lost N lines, but at least we didn't deadlock the system". See f975237b76827956fe13ecfe993a319158e2c303 for more details, it contains a list of recursive-printk deadlock scenarios that printk_safe was meant to handle. It is possible and OK to lose messages in printk_safe/printk_nmi printk_safe_enter_irqsave() printk printk ... ... printk printk printk_safe_exit_irqrestore() No flush will take place as long as there is no IRQ on that CPU. But printk_safe and printk_nmi are solving different problem in the first place. > > I'll re-read this one tomorrow. Not quite following it. > > I'll add more capitals next time ;-) Ha-ha-ha ;) [..] > > pintk_safe was designed to be recursive. It was never designed to be > > used to troubleshoot or debug consoles. But it was designed to be > > recursive - because that's the sort of the problems it was meant to > > handle: recursive printks that would otherwise deadlock us. That's why > > we have it in the first place. > > So printk safe is only triggered when at the same context? If we can > guarantee that printk safe is triggered only when its because a printk > is happening at the same context (not because of an interrupt, but > really at the same context, using my context check), then I'm fine with > delaying them to a work queue. printk_safe is for printk recursion only. It happens in the same context only. When we switch to printk_safe we disable local IRQs, NMIs have their own printk_nmi thing. And the way we flush printk_safe is mostly recursive. Because we flush when we know that we will not deadlock [as much as we can; we can't control any 3rd party locks which might be involved; thus printk_deferred() usage]. Usually it's something like printk spin_lock_irqsave(logbuf_lock) printk spin_lock_irqsave(logbuf_lock) << deadlock What we have with printk_safe is printk local_irq_save printk_safe_enter spin_lock(logbuf_lock) printk vprintk_safe queue irq work spin_unlock(logbuf_lock) printk_safe_exit local_irq_restore >>> IRQ work printk_safe_flush printk spin_lock_irqsave(logbuf_lock) log_store() spin_unlock_irqrestore(logbuf_lock) So we flush printk_safe ASAP, which usually (unless originally we were not in IRQ) means that the flush is recursive, but safe - we don't deadlock. > That is, if we have this: > > printk() > console_lock() > <interrupt> > printk() > add to log buffer > <print irq printk too> > console_unlock(); Right. This is what we have right now. Every time we enable local IRQs in the console_unlock() printing loop - we flush printk_safe. And that's the problem. > printk() > console_lock() > <console does a printk> > put in printk safe buffer > trigger work queue > console_unlock() > <work queue> > flush safe buffer > printk() Right. This is what we will have with WQ. We don't flush printk_safe until we return from console_unlock(). Because printk() disables preemption for the duration of console_unlock(), we can't schedule WQ on that CPU. And we schedule flushing work only on the CPU that has triggered the recursion. Another thing: console_lock() blah blah console_unlock() In this case we will flush printk_safe withing the printing loop. Immediately. But we don't care - the CPU is preemptible, we don't lock up the kernel. > Then I'm fine with that. > > I have to look at the latest code. If this is indeed what we have, then > I admit I misunderstood the problem you want to solve. > > I only want recursive printks (those that are actually triggered by > doing a printk) to be allowed to be delayed. > > Make sense? Please take a look. -ss ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-24 2:11 ` Sergey Senozhatsky @ 2018-01-24 2:52 ` Steven Rostedt 2018-01-24 4:44 ` Sergey Senozhatsky 0 siblings, 1 reply; 140+ messages in thread From: Steven Rostedt @ 2018-01-24 2:52 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Sergey Senozhatsky, Petr Mladek, Tejun Heo, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Pavel Machek, linux-kernel On Wed, 24 Jan 2018 11:11:33 +0900 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> wrote: > Please take a look. Was there something specific to look at? I'm doing a hundred different things at once, and my memory cache keeps getting flushed. -- Steve ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-24 2:52 ` Steven Rostedt @ 2018-01-24 4:44 ` Sergey Senozhatsky 0 siblings, 0 replies; 140+ messages in thread From: Sergey Senozhatsky @ 2018-01-24 4:44 UTC (permalink / raw) To: Steven Rostedt Cc: Sergey Senozhatsky, Sergey Senozhatsky, Petr Mladek, Tejun Heo, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Pavel Machek, linux-kernel On (01/23/18 21:52), Steven Rostedt wrote: > On Wed, 24 Jan 2018 11:11:33 +0900 > Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> wrote: > > > Please take a look. > > Was there something specific to look at? Not really. Just my previous email, basically. You said "I have to look at the latest code." so I replied. Well, if the proposed direction does make sense then I'll send out a patch. > I'm doing a hundred different things at once, and my memory cache... Meltdown vulnerable? Suddenly it all makes sense - you talk too fast because of speculative execution... ;) -ss ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-23 16:01 ` Sergey Senozhatsky 2018-01-23 16:24 ` Steven Rostedt @ 2018-01-23 17:22 ` Tejun Heo 1 sibling, 0 replies; 140+ messages in thread From: Tejun Heo @ 2018-01-23 17:22 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Steven Rostedt, Sergey Senozhatsky, Petr Mladek, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Pavel Machek, linux-kernel Hello, Sergey. On Wed, Jan 24, 2018 at 01:01:53AM +0900, Sergey Senozhatsky wrote: > On (01/23/18 10:41), Steven Rostedt wrote: > [..] > > We can have more. But if printk is causing printks, that's a major bug. > > And work queues are not going to fix it, it will just spread out the > > pain. Have it be 100 printks, it needs to be fixed if it is happening. > > And having all printks just generate more printks is not helpful. Even > > if we slow them down. They will still never end. > > Dropping the messages is not the solution either. The original bug report > report was - this "locks up my kernel". That's it. That's all people asked > us to solve. > > With WQ we don't lockup the kernel, because we flush printk_safe in > preemptible context. And people are very much expected to fix the > misbehaving consoles. But that should not be printk_safe problem. I really don't care as long as it's robust enough. Thanks. -- tejun ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-19 18:20 ` Steven Rostedt 2018-01-20 7:14 ` Sergey Senozhatsky @ 2018-01-20 12:19 ` Tejun Heo 2018-01-20 14:51 ` Steven Rostedt 1 sibling, 1 reply; 140+ messages in thread From: Tejun Heo @ 2018-01-20 12:19 UTC (permalink / raw) To: Steven Rostedt Cc: Petr Mladek, Sergey Senozhatsky, Sergey Senozhatsky, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Pavel Machek, linux-kernel Hello, Steven. On Fri, Jan 19, 2018 at 01:20:52PM -0500, Steven Rostedt wrote: > I was thinking about this a bit more, and instead of offloading a > recursive printk, perhaps its best to simply throttle it. Because the > problem may not go away if a printk thread takes over, because the bug > is really the printk infrastructure filling the printk buffer keeping > printk from ever stopping. > > This patch detects that printk is causing itself to print more and > throttles it after 3 messages have printed due to recursion. Could you > see if this helps your test cases? Sure, if this is the approach we're gonna take, I can try it with the silly test code and also try to reproduce the original problem and see whether this helps. I'm a bit worried tho because this essentially seems like "detect recursion, ignore messages" approach. netcons can have a very large surface for bugs. Suppressing those messages would make them difficult to debug. For example, all our machines have both serial console (thus the slowness) and netconsole hooked up and netcons code has had its fair share of issues. This would likely make tracking down those problems more challenging. Can we discuss pros and cons of this approach against offloading before committing to this? Thanks. -- tejun ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-20 12:19 ` Tejun Heo @ 2018-01-20 14:51 ` Steven Rostedt 0 siblings, 0 replies; 140+ messages in thread From: Steven Rostedt @ 2018-01-20 14:51 UTC (permalink / raw) To: Tejun Heo Cc: Petr Mladek, Sergey Senozhatsky, Sergey Senozhatsky, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Pavel Machek, linux-kernel On Sat, 20 Jan 2018 04:19:53 -0800 Tejun Heo <tj@kernel.org> wrote: > I'm a bit worried tho because this essentially seems like "detect > recursion, ignore messages" approach. netcons can have a very large > surface for bugs. Suppressing those messages would make them > difficult to debug. For example, all our machines have both serial > console (thus the slowness) and netconsole hooked up and netcons code > has had its fair share of issues. This would likely make tracking > down those problems more challenging. Well, it's not totally ignoring them. There's a variable that tells printk how many to print before it starts ignoring them. I picked 3, but that could very well be 5 or 10. Probably 10 is the best, because then it would give us enough idea why printk is recursing on itself without overloading the buffer. And I made it a variable to easily make it a knob for userspace to tweak if need be. > > Can we discuss pros and cons of this approach against offloading > before committing to this? I'm open. I was just thinking about the scenario that you mentioned and how what the best way to solve it would be. We need to define the exact problem(s) we are dealing with before we offer a solution. The one thing I don't want is a solution looking for a problem. I want a full understanding of what the problem exactly is and then we can discuss various solutions, and how they solve the problem(s). Otherwise we are just doing (to quote Linus) code masturbation. -- Steve ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-17 17:12 ` Steven Rostedt 2018-01-17 18:42 ` Steven Rostedt @ 2018-01-17 20:05 ` Tejun Heo 2018-01-18 5:43 ` Sergey Senozhatsky 2018-01-18 11:51 ` Petr Mladek 2018-01-18 5:42 ` Sergey Senozhatsky 2 siblings, 2 replies; 140+ messages in thread From: Tejun Heo @ 2018-01-17 20:05 UTC (permalink / raw) To: Steven Rostedt Cc: Petr Mladek, Sergey Senozhatsky, Sergey Senozhatsky, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Pavel Machek, linux-kernel Hello, Steven. On Wed, Jan 17, 2018 at 12:12:51PM -0500, Steven Rostedt wrote: > From what I gathered, you said an OOM would trigger, and then the > network console would not be able to allocate memory and it would > trigger a printk too, and cause an infinite amount of printks. Yeah, it falls into back-and-forth loop between the OOM code and netconsole path. > This could very well be a great place to force offloading. If a printk > is called from within a printk, at the same context (normal, softirq, > irq or NMI), then we should trigger the offloading. I was thinking more of a timeout based approach (ie. if stuck for longer than X or X messages, offload), but if local feedback loop is the only thing we're missing after your improvements, detecting that specific condition definitely works and is likely a better approach in terms of message delivery guarantee. > +static void kick_offload_thread(void) > +{ > + /* > + * Consoles are triggering printks, offload the printks > + * to another CPU to hopefully avoid a lockup. > + */ > +} ... > @@ -2333,6 +2390,7 @@ void console_unlock(void) > > for (;;) { > struct printk_log *msg; > + bool offload; > size_t ext_len = 0; > size_t len; > > @@ -2393,15 +2451,20 @@ void console_unlock(void) > * waiter waiting to take over. > */ > console_lock_spinning_enable(); > + offload = recursion_check_start(); > > stop_critical_timings(); /* don't trace print latency */ > call_console_drivers(ext_text, ext_len, text, len); > start_critical_timings(); > > + recursion_check_finish(offload); > + > if (console_lock_spinning_disable_and_check()) { > printk_safe_exit_irqrestore(flags); > return; > } > + if (offload) > + kick_offload_thread(); Yeah, something like this would definitely work. Thanks a lot. -- tejun ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-17 20:05 ` Tejun Heo @ 2018-01-18 5:43 ` Sergey Senozhatsky 2018-01-18 11:51 ` Petr Mladek 1 sibling, 0 replies; 140+ messages in thread From: Sergey Senozhatsky @ 2018-01-18 5:43 UTC (permalink / raw) To: Tejun Heo Cc: Steven Rostedt, Petr Mladek, Sergey Senozhatsky, Sergey Senozhatsky, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Pavel Machek, linux-kernel On (01/17/18 12:05), Tejun Heo wrote: [..] > > This could very well be a great place to force offloading. If a printk > > is called from within a printk, at the same context (normal, softirq, > > irq or NMI), then we should trigger the offloading. > > I was thinking more of a timeout based approach (ie. if stuck for > longer than X or X messages, offload) yep, that's what I want. for a whole bunch of different reasons. -ss ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-17 20:05 ` Tejun Heo 2018-01-18 5:43 ` Sergey Senozhatsky @ 2018-01-18 11:51 ` Petr Mladek 1 sibling, 0 replies; 140+ messages in thread From: Petr Mladek @ 2018-01-18 11:51 UTC (permalink / raw) To: Tejun Heo Cc: Steven Rostedt, Sergey Senozhatsky, Sergey Senozhatsky, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Pavel Machek, linux-kernel On Wed 2018-01-17 12:05:51, Tejun Heo wrote: > Hello, Steven. > > On Wed, Jan 17, 2018 at 12:12:51PM -0500, Steven Rostedt wrote: > > From what I gathered, you said an OOM would trigger, and then the > > network console would not be able to allocate memory and it would > > trigger a printk too, and cause an infinite amount of printks. > > Yeah, it falls into back-and-forth loop between the OOM code and > netconsole path. > > > This could very well be a great place to force offloading. If a printk > > is called from within a printk, at the same context (normal, softirq, > > irq or NMI), then we should trigger the offloading. > > I was thinking more of a timeout based approach (ie. if stuck for > longer than X or X messages, offload), but if local feedback loop is > the only thing we're missing after your improvements, detecting that > specific condition definitely works and is likely a better approach in > terms of message delivery guarantee. I think that we could combine both. The recursion can be detected rather easily and immediately so there is no reason to wait. Once we have the code for offloading from recursion then we could kick_offload_thread() also from other reasons, e.g. when console_unlock() takes too long. I think that Sergey is already playing with this. It seems that we all could be happy in the end. Best Regards, Petr PS: I am sorry for the answer yesterday. Tejun's mail did not mention any details about the problem. I evidently forgot them. I have OOM and printk issues associated with Tetsuo. So I messed it. Believe me. It is a big relief to realize that we are not in the cycle again. ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-17 17:12 ` Steven Rostedt 2018-01-17 18:42 ` Steven Rostedt 2018-01-17 20:05 ` Tejun Heo @ 2018-01-18 5:42 ` Sergey Senozhatsky 2 siblings, 0 replies; 140+ messages in thread From: Sergey Senozhatsky @ 2018-01-18 5:42 UTC (permalink / raw) To: Steven Rostedt Cc: Tejun Heo, Petr Mladek, Sergey Senozhatsky, Sergey Senozhatsky, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Pavel Machek, linux-kernel On (01/17/18 12:12), Steven Rostedt wrote: [..] > /* > * Can we actually use the console at this time on this cpu? > @@ -2333,6 +2390,7 @@ void console_unlock(void) > > for (;;) { > struct printk_log *msg; > + bool offload; > size_t ext_len = 0; > size_t len; > > @@ -2393,15 +2451,20 @@ void console_unlock(void) > * waiter waiting to take over. > */ > console_lock_spinning_enable(); > + offload = recursion_check_start(); > > stop_critical_timings(); /* don't trace print latency */ > call_console_drivers(ext_text, ext_len, text, len); > start_critical_timings(); > > + recursion_check_finish(offload); > + > if (console_lock_spinning_disable_and_check()) { > printk_safe_exit_irqrestore(flags); > return; > } > + if (offload) > + kick_offload_thread(); > > printk_safe_exit_irqrestore(flags); ^^^^^^^^^^^^^^^^ but we call console drivers in printk_safe. printk -> console_drivers -> printk will be redirected to this-CPU printk_safe buffer. -ss ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-12 1:30 ` Steven Rostedt 2018-01-12 2:55 ` Steven Rostedt @ 2018-01-12 3:12 ` Sergey Senozhatsky 1 sibling, 0 replies; 140+ messages in thread From: Sergey Senozhatsky @ 2018-01-12 3:12 UTC (permalink / raw) To: Steven Rostedt Cc: Sergey Senozhatsky, Petr Mladek, Tejun Heo, Sergey Senozhatsky, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Pavel Machek, linux-kernel On (01/11/18 20:30), Steven Rostedt wrote: [..] > Today, printk() can print for a time of A * B, where, as you state > above: > > A is the amount of data to print in the worst case > B the time call_console_drivers() needs to print a single > char to all registered and enabled consoles > > In the worse case, the current approach is A is infinite. That is, > printk() never stops, as long as there is a printk happening on another > CPU before B can finish. A will keep growing. The call to printk() will > never return. The more CPUs you have, the more likely this will occur. > All it takes is a few CPUs doing periodic printks. If there is a slow > console, where the periodic printk on other CPUs occur quicker than the > first can finish, the first one will be stuck forever. Doesn't take > much to have this happen. console_sem owner can stuck in console_unlock() not because of printk-s happening right now on other CPUs, but because those printk-s could have happened while console_sem owner was preempted. when it comes back it has a ton of pending messages. I said it before - "we stuck in console_unlock() because others CPUs printk right now a lot" is not always true. we have preemption. and the "last console_sem owner prints it all" is not good in this case. > With my patch, A is fixed to the size of the buffer. A single printk() > can never print more than that. If another CPU comes in and does a > printk, then it will take over the task of printing, and release the > first printk. yes. and "another CPU" that comes to take over has to print all the pending messages. from whatever context it's currently in. and bringing A * B below C can be quite tricky, if possible at all (!). most likely people will just add more touch_nmi_watchdog(). again, I don't disagree on "let's bound printk". yes, we totally should! but the bound must be realistic if we want to fix the damn thing (either with printk_kthread, or hand off, or anything else). -ss ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-11 16:29 ` Steven Rostedt 2018-01-12 1:30 ` Steven Rostedt @ 2018-01-12 2:56 ` Sergey Senozhatsky 2018-01-12 3:21 ` Steven Rostedt 1 sibling, 1 reply; 140+ messages in thread From: Sergey Senozhatsky @ 2018-01-12 2:56 UTC (permalink / raw) To: Steven Rostedt Cc: Sergey Senozhatsky, Petr Mladek, Tejun Heo, Sergey Senozhatsky, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Pavel Machek, linux-kernel Hi, On (01/11/18 11:29), Steven Rostedt wrote: [..] > > - if the patch's goal is to bound (not necessarily to watchdog's threshold) > > the amount of time we spend in console_unlock(), then the patch is kinda > > overcomplicated. but no further questions in this case. > > It's goal is to keep printk from running amok on a single CPU like it > currently does. This prevents one printk from never ending. And it is > far from complex. It doesn't deal with "offloading". The "handover" is > only done to those that are doing printks. What do you do if all CPUs > are in "critical sections", how would a "handoff to safe" work? Will > the printks never get out? If the machine were to triple fault and > reboot, we lost all of it. make printk_kthread to be just one of the things that compete for handed off console_sem, along with other CPUs. > > - but if the patch's goal is to bound (to lockup threshold) the amount of > > time spent in console_unlock() in order to avoid lockups [uh, a reason], > > then the patch is rather oversimplified. > > It's bound to print all the information that has been added to the > printk buffer. You want to bound it to some "time" not some... it's aligned with watchdog expectations. which is deterministic, isn't it? > My method, there's really no delay between a hand off. There's always > an active CPU doing printing. It matches the current method which works > well for getting information out. A delayed approach will break no, not necessarily. and my previous patch set had some bits of that "combined offloading and hand off" behaviour. I was thinking about extending it further, but decided not to. - printk_kthread would spin on console_owner until current console_sem hand off. > > claiming that for any given A, B, C the following is always true > > > > A * B < C > > > > where > > A is the amount of data to print in the worst case > > B the time call_console_drivers() needs to print a single > > char to all registered and enabled consoles > > C the watchdog's threshold > > > > is not really a step forward. > > It's no different than what we have, except that we currently have A > being infinite. My patch makes A no longer infinite, but a constant. my point is - the constant can be unrealistically high. and can easily overlap watchdog_threshold, returning printk back to unbound land. IOW, if your bound is above the watchdog threshold then you don't have any bounds. by example, with console=ttyS1,57600n8 - keep increasing the watchdog_threshold until watchdog stops complaining? or - keep reducing the logbuf size until it can be flushed under watchdog_threshold seconds? and I demonstrated how exactly we end up having a full logbuf of pending messages even on systems with faster consoles. [..] > Great, and there's cases that die that my patch solves. Lets add my > patch now since it is orthogonal to an offloading approach and see how > it works, because it would solve issues that I have hit. If you can > show that this isn't good enough we can add another approach. it bounds printk. yes, good! that's what I want. but it bounds it to a wrong value. I want more deterministic and close to reality bound. and I also want to get rid of "the last console_sem owner prints it all" thing. I demonstrated with the traces how that thing can bite. > Honestly, I don't see why you are against this patch. prove it! show me exactly when and where I said that I NACK or block the patch? seriously. > It doesn't stop your work. and I never said it would. your patch changes nothing on my side, that's my message. as of now I have out-of-tree patches, well I'll keep using them. nothing new. > If this patch isn't enough BINGO! this is all I'm trying to say. and the only reply (if there is any at all!) I'm getting is "GTFO!!! your problems are unrealistic! we gonna release the patch and wait for someone to come along and say us something new about printk issues. but not you!". > (but it does fix some issues) obviously there are cases which your patch addresses. have I ever denied that? but, once again, obviously, there are cases which it doesn't. and those cases tend to bite my setups. I have repeated it many times, and have explained in great details which parts I'm talking about. and I have never run unrealistic test_printk.ko against your patch or anything alike; why the heck would I do that. > Really, it sounds like you are afraid of this patch, that it might > be good enough for most cases which would make adding another approach > even more difficult. LOL! wish I knew how to capture screenshots on Linux! -ss ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-12 2:56 ` Sergey Senozhatsky @ 2018-01-12 3:21 ` Steven Rostedt 2018-01-12 10:05 ` Sergey Senozhatsky 0 siblings, 1 reply; 140+ messages in thread From: Steven Rostedt @ 2018-01-12 3:21 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Petr Mladek, Tejun Heo, Sergey Senozhatsky, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Pavel Machek, linux-kernel On Fri, 12 Jan 2018 11:56:12 +0900 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> wrote: > Hi, > > On (01/11/18 11:29), Steven Rostedt wrote: > [..] > > > - if the patch's goal is to bound (not necessarily to watchdog's threshold) > > > the amount of time we spend in console_unlock(), then the patch is kinda > > > overcomplicated. but no further questions in this case. > > > > It's goal is to keep printk from running amok on a single CPU like it > > currently does. This prevents one printk from never ending. And it is > > far from complex. It doesn't deal with "offloading". The "handover" is > > only done to those that are doing printks. What do you do if all CPUs > > are in "critical sections", how would a "handoff to safe" work? Will > > the printks never get out? If the machine were to triple fault and > > reboot, we lost all of it. > > make printk_kthread to be just one of the things that compete for > handed off console_sem, along with other CPUs. Are you going to make printk thread a high priority task? > > > > - but if the patch's goal is to bound (to lockup threshold) the amount of > > > time spent in console_unlock() in order to avoid lockups [uh, a reason], > > > then the patch is rather oversimplified. > > > > It's bound to print all the information that has been added to the > > printk buffer. You want to bound it to some "time" > > not some... it's aligned with watchdog expectations. > which is deterministic, isn't it? When do you start the timer? What you are trying to solve isn't a single printk that gets stuck. Just look at Tejun's module. To trigger what he wanted, he had to do 10,000 printks from an interrupt context. > > > My method, there's really no delay between a hand off. There's always > > an active CPU doing printing. It matches the current method which works > > well for getting information out. A delayed approach will break > > no, not necessarily. and my previous patch set had some bits of that > "combined offloading and hand off" behaviour. I was thinking about > extending it further, but decided not to. - printk_kthread would spin > on console_owner until current console_sem hand off. Is printk_thread always running, taking up CPU cycles? > > > > claiming that for any given A, B, C the following is always true > > > > > > A * B < C > > > > > > where > > > A is the amount of data to print in the worst case > > > B the time call_console_drivers() needs to print a single > > > char to all registered and enabled consoles > > > C the watchdog's threshold > > > > > > is not really a step forward. > > > > It's no different than what we have, except that we currently have A > > being infinite. My patch makes A no longer infinite, but a constant. > > my point is - the constant can be unrealistically high. and can > easily overlap watchdog_threshold, returning printk back to unbound > land. IOW, if your bound is above the watchdog threshold then you > don't have any bounds. That makes no sense. > > by example, with console=ttyS1,57600n8 > - keep increasing the watchdog_threshold until watchdog stops > complaining? > or > - keep reducing the logbuf size until it can be flushed under > watchdog_threshold seconds? After playing with the module in my last email, I think your trying to solve multiple printks, not one that is stuck. I'm solving the one that is stuck problem, which was easily triggered by a simple (non stess test) module. > > > and I demonstrated how exactly we end up having a full logbuf of pending > messages even on systems with faster consoles. Where did you demonstrate that. There's so many emails I can't keep up. But still, take a look at my simple module. I locked up the system immediately with something that shouldn't have locked up the system. And my patch fixed it. I think that speaks louder than any of our opinions. > > > [..] > > Great, and there's cases that die that my patch solves. Lets add my > > patch now since it is orthogonal to an offloading approach and see how > > it works, because it would solve issues that I have hit. If you can > > show that this isn't good enough we can add another approach. > > it bounds printk. yes, good! that's what I want. but it bounds it to a > wrong value. I want more deterministic and close to reality bound. > and I also want to get rid of "the last console_sem owner prints it all" > thing. I demonstrated with the traces how that thing can bite. I have not seen any realistic traces, but perhaps I missed something. It all requires lots of printks, in weird scenarios. I demonstrated that the system can be locked up with few printks (one per cpu per millisecond), and my patch solves it. > > > > Honestly, I don't see why you are against this patch. > > prove it! show me exactly when and where I said that I NACK or > block the patch? seriously. Why are we having this discussion then? Just give your Ack to my patch, and we can look to see if we need to improve on it. > > > > It doesn't stop your work. > > and I never said it would. your patch changes nothing on my side, that's > my message. as of now I have out-of-tree patches, well I'll keep using > them. nothing new. > > > > If this patch isn't enough > > BINGO! this is all I'm trying to say. > and the only reply (if there is any at all!) I'm getting is > "GTFO!!! your problems are unrealistic! we gonna release the > patch and wait for someone to come along and say us something > new about printk issues. but not you!". I think we are misunderstanding each other. It didn't seem that you were on board with this patch. Why didn't you just say, "here's my ack for this patch, but we are going to need more"? This could just be that we are misunderstanding each other. I've been saying from the beginning, that my patch is an incremental approach. But I never got the "OK" from you about it. You just pointed out what you thought was its short comings. Yes, you never actually NACK'd it (like Tejun did), but you never gave it your blessing either. > > > > (but it does fix some issues) > > obviously there are cases which your patch addresses. have I ever > denied that? but, once again, obviously, there are cases which it > doesn't. and those cases tend to bite my setups. I have repeated > it many times, and have explained in great details which parts I'm > talking about. Well, I could argue that the cases you are trying to solve were intensified by the bug my patch fixes. > > and I have never run unrealistic test_printk.ko against your patch > or anything alike; why the heck would I do that. > > > > Really, it sounds like you are afraid of this patch, that it might > > be good enough for most cases which would make adding another approach > > even more difficult. > > LOL! wish I knew how to capture screenshots on Linux! OK, if you are fine with my patch, just give it an Ack, and we push it into the wild and see what happens. If things go as you say, not good enough, then we can add your approach. I never veered from this. It just appeared that you didn't want this patch to go in without your additions. -- Steve ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-12 3:21 ` Steven Rostedt @ 2018-01-12 10:05 ` Sergey Senozhatsky 2018-01-12 12:21 ` Steven Rostedt 0 siblings, 1 reply; 140+ messages in thread From: Sergey Senozhatsky @ 2018-01-12 10:05 UTC (permalink / raw) To: Steven Rostedt Cc: Sergey Senozhatsky, Petr Mladek, Tejun Heo, Sergey Senozhatsky, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Pavel Machek, linux-kernel Steven, we are having too many things in one email, I've dropped most of them to concentrate on one topic only. On (01/11/18 22:21), Steven Rostedt wrote: [..] > > After playing with the module in my last email, I think your trying to > solve multiple printks, not one that is stuck I wouldn't say so. I'm trying to fix the same thing. but when system has additional limitations - there are NO concurrent printk-s to hand off to and A * B > C, so we can't have "last console_sem prints it all" bounded to O(A * B). - no concurrent printk-s to hand off is explainable - preemption under console_sem and the fact that console_sem is a sleeping lock. - on a system with slow consoles A * B > C is also pretty clear. - slow consoles make preemption under console_sem more likely. to summarize: 1) I have a slow serial console. call_console_drivers() is significantly slower than log_store(). the disproportion can be 1:1000. that is while CPUA prints a single logbuf message, other CPUs can add 1000 new entries. 2) not every CPU that stuck in console_unlock() came there through printk(). CPUs that directly call console_lock() can sleep under console_sem. a bunch of printk-s can happen in the meantime -- OOM can happen in the meantime; no hand off will happen. 3) console_unlock(void) { for (;;) { printk_safe_enter_irqsave(flags); // lock-unlock logbuf call_console_drivers(ext_text, ext_len, text, len); printk_safe_exit_irqrestore(flags); } } with slow serial console, call_console_drivers() takes enough time to to make preemption of a current console_sem owner right after it irqrestore() highly possible; unless there is a spinning console_waiter. which easily may not be there; but can come in while current console_sem is preempted, why not. so when preempted console_sem owner comes back - it suddenly has a whole bunch of new messages to print and on one to hand off printing to. in a super imperfect and ugly world, BTW, this is how console_unlock() still can be O(infinite): schedule between the printed lines [even !PREEMPT kernel tries to cond_resched() after every line it prints] from current console_sem owner and printk() while console_sem owner is scheduled out. 4) the interesting thing here is that call_console_drivers() can cause console_sem owner to schedule even if it has handed off the ownership. because waiting CPU has to spin with local IRQs disabled as long as call_console_drivers() prints its message. so if consoles are slow, then the first thing the waiter will face after it receives the console_sem ownership and enables the IRQs is - preemption. so hand off is not immediate. there is a possibility of re-scheduling between hand off and actual printing. so that "there is always an active printing CPU" is not quite true. vprintk_emit() { console_trylock_spinning(void) { printk_safe_enter_irqsave(flags); while (READ_ONCE(console_waiter)) // spins as long as call_console_drivers() on other CPU cpu_relax(); printk_safe_exit_irqrestore(flags); ---> } | // preemptible up until printk_safe_enter_irqsave() in console_unlock() | console_unlock() | { | | .... | for (;;) { |--------------> printk_safe_enter_irqsave(flags); .... } } } reverting 6b97a20d3a7909daa06625d4440c2c52d7bf08d7 may be the right thing after all. preemption latencies can be high. especially during OOM. I went through reports that Tetsuo provided over the years. On some of his tests preempted console_sem owner can sleep long enough to let other CPUs to start overflowing the logbuf with the pending messages. more on preemption. see this email, for instance. a bunch of links in the middle, scroll down: https://marc.info/?l=linux-kernel&m=151375384500555 BTW, note the disclaimer [in capitals] - LIKE I SAID, IF STEVEN OR PETR WANT TO PUSH THE PATCH, I'M NOT GOING TO BLOCK IT. > > and I demonstrated how exactly we end up having a full logbuf of pending > > messages even on systems with faster consoles. > > Where did you demonstrate that. There's so many emails I can't keep up. > > But still, take a look at my simple module. I locked up the system > immediately with something that shouldn't have locked up the system. > And my patch fixed it. I think that speaks louder than any of our > opinions. sure it will! you don't have scheduler latencies mixed in under console_sem (neither in vprintk_emit(), nor in console_unlock(), nor anywhere in between), you have printks only from non-preemptible contexts, so your hand off logic always works and is never preempted, you have concurrent printks from many CPUs, so once again your hand off logic always works, and you have fast console, and, due to hand off, console_sem is never up() so no schedulable context can ever acquire it - you pass it between non-preemptible printk CPUs only. I cannot see why your patch would not help. your patch works fine in these conditions, I said it many times. and I have no issues with that. my setups (real HW, by the way) are far from those conditions. but there is an active denial of that. anyway. like I said weeks ago and repeated it in several emails: I have no intention to NACK or block the patch. but the patch is not doing enough. that's all I'm saying. -ss ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-12 10:05 ` Sergey Senozhatsky @ 2018-01-12 12:21 ` Steven Rostedt 2018-01-12 12:55 ` Petr Mladek 2018-01-13 7:28 ` Sergey Senozhatsky 0 siblings, 2 replies; 140+ messages in thread From: Steven Rostedt @ 2018-01-12 12:21 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Petr Mladek, Tejun Heo, Sergey Senozhatsky, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Pavel Machek, linux-kernel On Fri, 12 Jan 2018 19:05:44 +0900 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> wrote: > Steven, we are having too many things in one email, I've dropped most > of them to concentrate on one topic only. I totally agree, and I believe this is the reason behind the tensions between us. We are not discussing the topic of the patch. > > On (01/11/18 22:21), Steven Rostedt wrote: > [..] > > > > After playing with the module in my last email, I think your trying to > > solve multiple printks, not one that is stuck > > I wouldn't say so. I'm trying to fix the same thing. but when system has > additional limitations - there are NO concurrent printk-s to hand off to > and A * B > C, so we can't have "last console_sem prints it all" bounded > to O(A * B). > > - no concurrent printk-s to hand off is explainable - preemption under > console_sem and the fact that console_sem is a sleeping lock. > > - on a system with slow consoles A * B > C is also pretty clear. > > - slow consoles make preemption under console_sem more likely. > > > to summarize: > > 1) I have a slow serial console. call_console_drivers() is significantly > slower than log_store(). > > the disproportion can be 1:1000. that is while CPUA prints a single > logbuf message, other CPUs can add 1000 new entries. > > 2) not every CPU that stuck in console_unlock() came there through printk(). > CPUs that directly call console_lock() can sleep under console_sem. a bunch > of printk-s can happen in the meantime -- OOM can happen in the meantime; > no hand off will happen. Yep, but I'm still not convinced you are seeing an issue with a single printk. An OOM does not do everything in one printk, it calls hundreds. Having hundreds of printks is an issue, especially in critical sections. The thing is, all of your analysis has been done on a system with the bug my patch fixes. The bug being, that any printk has no limit to how much it can print, regardless of logbuf size. When debugging an issue, if I find a bug that can affect that issue, although it may not be the cause, I fix that first, and start over looking at the original issue, because that bug fix can have an effect, and in lots of cases, fixing the bug makes the fix for the original bug easier. There's two issues here: #1) The bug I'm fixing. printk() can get stuck printing forever. I demonstrated this by a simple module, that locked up the system by doing something that was not stressful. #2) The bug you are seeing, where printk can trigger the watchdog timer. This is much harder to hit. I have not seen any simple module that can trigger it. This patch series is focused on fixing #1, #2 is out of scope, and continuing discussing it will just cause us to argue more. > > 3) console_unlock(void) > { > for (;;) { > printk_safe_enter_irqsave(flags); > // lock-unlock logbuf > call_console_drivers(ext_text, ext_len, text, len); > printk_safe_exit_irqrestore(flags); > } > } > > with slow serial console, call_console_drivers() takes enough time to > to make preemption of a current console_sem owner right after it irqrestore() > highly possible; unless there is a spinning console_waiter. which easily may > not be there; but can come in while current console_sem is preempted, why not. > so when preempted console_sem owner comes back - it suddenly has a whole bunch > of new messages to print and on one to hand off printing to. in a super > imperfect and ugly world, BTW, this is how console_unlock() still can be > O(infinite): schedule between the printed lines [even !PREEMPT kernel tries I'm not fixing console_unlock(), I'm fixing printk(). BTW, all my kernels are CONFIG_PREEMPT (I'm a RT guy), my mind thinks more about PREEMPT kernels than !PREEMPT ones. > to cond_resched() after every line it prints] from current console_sem > owner and printk() while console_sem owner is scheduled out. > > 4) the interesting thing here is that call_console_drivers() can > cause console_sem owner to schedule even if it has handed off the > ownership. because waiting CPU has to spin with local IRQs disabled > as long as call_console_drivers() prints its message. so if consoles > are slow, then the first thing the waiter will face after it receives > the console_sem ownership and enables the IRQs is - preemption. If the waiter is preempted, that means its not in a critical section. Isn't that what you want? > > so hand off is not immediate. there is a possibility of re-scheduling > between hand off and actual printing. so that "there is always an active > printing CPU" is not quite true. > > vprintk_emit() > { > > console_trylock_spinning(void) > { > printk_safe_enter_irqsave(flags); > while (READ_ONCE(console_waiter)) // spins as long as call_console_drivers() on other CPU > cpu_relax(); > printk_safe_exit_irqrestore(flags); > ---> } > | // preemptible up until printk_safe_enter_irqsave() in console_unlock() Again, this means the waiter is not in a critical section. Why do we care? You bring up a good point, that shows that my patch helps you statistically. We want printks that are not in critical sections (interrupts or preemption disabled) to do the most work. With my patch, those that call printk in an atomic section, are the ones most likely not have to print more than what they are printing. Because they will have the console lock without having "console ownership" for the shortest time. Remember, there is no hand off if you own console lock without console ownership. Those that can be preempted, are most likely to have console lock without console ownership, and have to do the most printing. > | console_unlock() > | { > | > | .... > | for (;;) { > |--------------> printk_safe_enter_irqsave(flags); > .... > } > > } > } > > reverting 6b97a20d3a7909daa06625d4440c2c52d7bf08d7 may be the right > thing after all. I would analyze that more before doing so. Because with my patch, I think we make those that can do long prints (without triggering a watchdog), the ones most likely doing the long prints. > > preemption latencies can be high. especially during OOM. I went > through reports that Tetsuo provided over the years. On some of > his tests preempted console_sem owner can sleep long enough to > let other CPUs to start overflowing the logbuf with the pending > messages. Sure, that's fine. Because if the one that has console_lock can be preempted, it should be fine to take time to do printks. > > more on preemption. see this email, for instance. a bunch of links in > the middle, scroll down: > https://marc.info/?l=linux-kernel&m=151375384500555 > > > BTW, note the disclaimer [in capitals] - > > LIKE I SAID, IF STEVEN OR PETR WANT TO PUSH THE PATCH, I'M NOT > GOING TO BLOCK IT. GREAT! Then we can continue this conversation after the patch goes in. Because I'm focused on fixing #1 above. > > > > > and I demonstrated how exactly we end up having a full logbuf of pending > > > messages even on systems with faster consoles. > > > > Where did you demonstrate that. There's so many emails I can't keep up. > > > > But still, take a look at my simple module. I locked up the system > > immediately with something that shouldn't have locked up the system. > > And my patch fixed it. I think that speaks louder than any of our > > opinions. > > sure it will! > you don't have scheduler latencies mixed in under console_sem (neither in > vprintk_emit(), nor in console_unlock(), nor anywhere in between), you have > printks only from non-preemptible contexts, so your hand off logic always > works and is never preempted, you have concurrent printks from many CPUs, > so once again your hand off logic always works, and you have fast console, > and, due to hand off, console_sem is never up() so no schedulable context > can ever acquire it - you pass it between non-preemptible printk CPUs only. > I cannot see why your patch would not help. your patch works fine in these > conditions, I said it many times. and I have no issues with that. my setups > (real HW, by the way) are far from those conditions. but there is an active > denial of that. OK, I modified my module to include a loop variable. You can add in a loop variable and the printer now does this: while (!READ_ONCE(stop_testing)) { for (i = 0; i < loops && !READ_ONCE(stop_testing); i++) { if (i & 1) preempt_disable(); pr_emerg("%5d%-75s\n", smp_processor_id(), " XXX PREEMPT"); if (i & 1) preempt_enable(); } msleep(1); } So I do the printk "loops" times (defined by what variable you put in as the module parameter). With my patch, I ran it with 10, then 100 and then 100000! (It's still running). Every other printk is done with preemption enabled. Is this what you mean? I ran this with my patch with and without serial enabled (with hyper-threading on 8 CPUs). Runs fine. 100,000 loops! Yes, and with CONFIG_PREEMPT=y Note, doing the preemption makes it harder to lock up the current kernel. I was not able to lock it up even with serial console. This goes to show that having printk called with preemption enabled, makes the preempted printk much more likely to be the one stuck doing the preemption. That means, statistically, the "safe" printks will be the more likely one to print. In fact, I had to add another option to my module to make it go back to only calling printk without preemption enabled. That locks up the kernel again with a slow console. Then I ran this without serial enabled (just VGA) on the kernel without my patch. With the printk always being called with preemption disabled, it only took loops=100 before to make it lock up! Yes, I'm able to lock up the kernel with no slow console, with a simple loop of 100 printks. Where my patch allows me to do 100,000 printks in that loop and I hardly notice it. But this only locks up if all printks are called without preemption (call my module with preempt=1). If I can lock up the kernel with a single fast console, with only a 100 printks per millisecond, I think that's a pretty serious bug. And my patch fixes it. I was not able lock up the system when calling printk with preemption enabled with or without serial on the current kernel. I think this shows that my point that statistically, a preemptable printk is more likely to get stuck doing the slow prints. And since it can be preempted, it doesn't affect the system at all. And the more it gets preempted, the more likely it will continue doing the prints. Which is a good thing. > > anyway. like I said weeks ago and repeated it in several emails: I have > no intention to NACK or block the patch. > but the patch is not doing enough. that's all I'm saying. > Great, then Petr can start pushing this through. Below is my latest module I used for testing: -- Steve #include <linux/module.h> #include <linux/delay.h> #include <linux/sched.h> #include <linux/mutex.h> #include <linux/workqueue.h> #include <linux/hrtimer.h> static bool stop_testing; static unsigned int loops = 1; static int preempt; static void preempt_printk_workfn(struct work_struct *work) { int i; while (!READ_ONCE(stop_testing)) { for (i = 0; i < loops && !READ_ONCE(stop_testing); i++) { bool no_preempt = preempt || (i & 1); if (no_preempt) preempt_disable(); pr_emerg("%5d%-75s\n", smp_processor_id(), no_preempt ? " XXX NOPREEMPT" : " XXX PREEMPT"); if (no_preempt) preempt_enable(); } msleep(1); } } static struct work_struct __percpu *works; static void finish(void) { int cpu; WRITE_ONCE(stop_testing, true); for_each_online_cpu(cpu) flush_work(per_cpu_ptr(works, cpu)); free_percpu(works); } static int __init test_init(void) { int cpu; works = alloc_percpu(struct work_struct); if (!works) return -ENOMEM; /* * This is just a test module. This will break if you * do any CPU hot plugging between loading and * unloading the module. */ for_each_online_cpu(cpu) { struct work_struct *work = per_cpu_ptr(works, cpu); INIT_WORK(work, &preempt_printk_workfn); schedule_work_on(cpu, work); } return 0; } static void __exit test_exit(void) { finish(); } module_param(loops, uint, 0); module_param(preempt, int, 0); module_init(test_init); module_exit(test_exit); MODULE_LICENSE("GPL"); ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-12 12:21 ` Steven Rostedt @ 2018-01-12 12:55 ` Petr Mladek 2018-01-13 7:31 ` Sergey Senozhatsky 2018-01-15 12:08 ` Steven Rostedt 2018-01-13 7:28 ` Sergey Senozhatsky 1 sibling, 2 replies; 140+ messages in thread From: Petr Mladek @ 2018-01-12 12:55 UTC (permalink / raw) To: Steven Rostedt Cc: Sergey Senozhatsky, Tejun Heo, Sergey Senozhatsky, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Pavel Machek, linux-kernel On Fri 2018-01-12 07:21:23, Steven Rostedt wrote: > On Fri, 12 Jan 2018 19:05:44 +0900 > Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> wrote: > > 3) console_unlock(void) > > { > > for (;;) { > > printk_safe_enter_irqsave(flags); > > // lock-unlock logbuf > > call_console_drivers(ext_text, ext_len, text, len); > > printk_safe_exit_irqrestore(flags); > > } > > } > > > > with slow serial console, call_console_drivers() takes enough time to > > to make preemption of a current console_sem owner right after it irqrestore() > > highly possible; unless there is a spinning console_waiter. which easily may > > not be there; but can come in while current console_sem is preempted, why not. > > so when preempted console_sem owner comes back - it suddenly has a whole bunch > > of new messages to print and on one to hand off printing to. in a super > > imperfect and ugly world, BTW, this is how console_unlock() still can be > > O(infinite): schedule between the printed lines [even !PREEMPT kernel tries > > I'm not fixing console_unlock(), I'm fixing printk(). BTW, all my > kernels are CONFIG_PREEMPT (I'm a RT guy), my mind thinks more about > PREEMPT kernels than !PREEMPT ones. I would say that the patch improves also console_unlock() but only in non-preemttive context. By other words, it makes console_unlock() finite in preemptible context (limited by buffer size). It might still be unlimited in non-preemtible context. > > to cond_resched() after every line it prints] from current console_sem > > owner and printk() while console_sem owner is scheduled out. > > > > 4) the interesting thing here is that call_console_drivers() can > > cause console_sem owner to schedule even if it has handed off the > > ownership. because waiting CPU has to spin with local IRQs disabled > > as long as call_console_drivers() prints its message. so if consoles > > are slow, then the first thing the waiter will face after it receives > > the console_sem ownership and enables the IRQs is - preemption. > > so hand off is not immediate. there is a possibility of re-scheduling > > between hand off and actual printing. so that "there is always an active > > printing CPU" is not quite true. > > > > vprintk_emit() > > { > > > > console_trylock_spinning(void) > > { > > printk_safe_enter_irqsave(flags); > > while (READ_ONCE(console_waiter)) // spins as long as call_console_drivers() on other CPU > > cpu_relax(); > > printk_safe_exit_irqrestore(flags); > > ---> } > > | // preemptible up until printk_safe_enter_irqsave() in console_unlock() > > | console_unlock() > > | { > > | > > | .... > > | for (;;) { > > |--------------> printk_safe_enter_irqsave(flags); > > .... > > } > > > > } > > } > > > > reverting 6b97a20d3a7909daa06625d4440c2c52d7bf08d7 may be the right > > thing after all. > > I would analyze that more before doing so. Because with my patch, I > think we make those that can do long prints (without triggering a > watchdog), the ones most likely doing the long prints. IMHO, it might make sense because it would help to see the messages faster. But I would prefer to handle this separately because it might also increase the risk of softlockups. Therefore it might cause regressions. We should also take into account the commit 8d91f8b15361dfb438ab6 ("printk: do cond_resched() between lines while outputting to consoles"). It has the same effect for console_lock() callers. > > BTW, note the disclaimer [in capitals] - > > > > LIKE I SAID, IF STEVEN OR PETR WANT TO PUSH THE PATCH, I'M NOT > > GOING TO BLOCK IT. > > GREAT! Then we can continue this conversation after the patch goes in. > Because I'm focused on fixing #1 above. Thanks for the disclaimer! > > anyway. like I said weeks ago and repeated it in several emails: I have > > no intention to NACK or block the patch. > > but the patch is not doing enough. that's all I'm saying. > > Great, then Petr can start pushing this through. > > Below is my latest module I used for testing: I am going to send v6 with fixes suggested for the 2nd patch by Steven. Best Regards, Petr ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-12 12:55 ` Petr Mladek @ 2018-01-13 7:31 ` Sergey Senozhatsky 2018-01-15 8:51 ` Petr Mladek 2018-01-15 12:08 ` Steven Rostedt 1 sibling, 1 reply; 140+ messages in thread From: Sergey Senozhatsky @ 2018-01-13 7:31 UTC (permalink / raw) To: Petr Mladek Cc: Steven Rostedt, Sergey Senozhatsky, Tejun Heo, Sergey Senozhatsky, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Pavel Machek, linux-kernel On (01/12/18 13:55), Petr Mladek wrote: [..] > > I'm not fixing console_unlock(), I'm fixing printk(). BTW, all my > > kernels are CONFIG_PREEMPT (I'm a RT guy), my mind thinks more about > > PREEMPT kernels than !PREEMPT ones. > > I would say that the patch improves also console_unlock() but only in > non-preemttive context. > > By other words, it makes console_unlock() finite in preemptible context > (limited by buffer size). It might still be unlimited in > non-preemtible context. could you elaborate a bit? [..] > > > reverting 6b97a20d3a7909daa06625d4440c2c52d7bf08d7 may be the right > > > thing after all. > > > > I would analyze that more before doing so. Because with my patch, I > > think we make those that can do long prints (without triggering a > > watchdog), the ones most likely doing the long prints. > > IMHO, it might make sense because it would help to see the messages > faster. But I would prefer to handle this separately because it > might also increase the risk of softlockups. Therefore it might > cause regressions. > > We should also take into account the commit 8d91f8b15361dfb438ab6 > ("printk: do cond_resched() between lines while outputting to > consoles"). It has the same effect for console_lock() callers. I replied in another email. -ss ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-13 7:31 ` Sergey Senozhatsky @ 2018-01-15 8:51 ` Petr Mladek 2018-01-15 9:48 ` Sergey Senozhatsky 2018-01-16 5:16 ` Sergey Senozhatsky 0 siblings, 2 replies; 140+ messages in thread From: Petr Mladek @ 2018-01-15 8:51 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Steven Rostedt, Sergey Senozhatsky, Tejun Heo, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Pavel Machek, linux-kernel On Sat 2018-01-13 16:31:00, Sergey Senozhatsky wrote: > On (01/12/18 13:55), Petr Mladek wrote: > [..] > > > I'm not fixing console_unlock(), I'm fixing printk(). BTW, all my > > > kernels are CONFIG_PREEMPT (I'm a RT guy), my mind thinks more about > > > PREEMPT kernels than !PREEMPT ones. > > > > I would say that the patch improves also console_unlock() but only in > > non-preemttive context. > > > > By other words, it makes console_unlock() finite in preemptible context > > (limited by buffer size). It might still be unlimited in > > non-preemtible context. > > could you elaborate a bit? Ah, I am sorry, I swapped the conditions. I meant that console_unlock() is finite in non-preemptible context. There are two possibilities if console_unlock() is in atomic context and never sleeps. First, if there are new printk() callers, they could take over the job. Second. if they are no more callers, the current owner will release the lock after processing the existing messages. In both situations, the current owner will not handle more than the entire buffer. Therefore it is limited. We might argue if it is enough. But the point is that it is limited which is a step forward. And I think that you already agreed that this was a step forward. The chance of taking over the lock is lower when console_unlock() owner could sleep. But then there is not a danger of a softlockup. In each case, this patch did not make it worse. Could we agree on this, please? All in all, this patch improved one scenario and did not make worse another one. We know that it does not fix everything. But it is a step forward. Could we agree on this, please? Best Regards, Petr ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-15 8:51 ` Petr Mladek @ 2018-01-15 9:48 ` Sergey Senozhatsky 2018-01-16 5:16 ` Sergey Senozhatsky 1 sibling, 0 replies; 140+ messages in thread From: Sergey Senozhatsky @ 2018-01-15 9:48 UTC (permalink / raw) To: Petr Mladek Cc: Sergey Senozhatsky, Steven Rostedt, Sergey Senozhatsky, Tejun Heo, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Pavel Machek, linux-kernel On (01/15/18 09:51), Petr Mladek wrote: > On Sat 2018-01-13 16:31:00, Sergey Senozhatsky wrote: > > On (01/12/18 13:55), Petr Mladek wrote: > > [..] > > > > I'm not fixing console_unlock(), I'm fixing printk(). BTW, all my > > > > kernels are CONFIG_PREEMPT (I'm a RT guy), my mind thinks more about > > > > PREEMPT kernels than !PREEMPT ones. > > > > > > I would say that the patch improves also console_unlock() but only in > > > non-preemttive context. > > > > > > By other words, it makes console_unlock() finite in preemptible context > > > (limited by buffer size). It might still be unlimited in > > > non-preemtible context. > > > > could you elaborate a bit? > > Ah, I am sorry, I swapped the conditions. I meant that > console_unlock() is finite in non-preemptible context. ah, OK. yes. it sill can be infinite, in preemptible context. a side note, no kernel or user space process is designed to loop in console_unlock(), so infinte console_unlock() still can do some damage. we don't crash the kernel, but if we somehow bring down the user space process, then things are not so clear. e.g. when we do lots of handoffs we don't up() the console_sem, so anything that might be sleeping in TASK_UNINTERRUPTIBLE on console_sem stays in that uninterruptible state, which possibly can fire the hung task alarm, which also may be configured to panic() the kernel (or some other type of watchdog). so panic() is still possible even if we do hand offs. but that's a completely different topic. > There are two possibilities if console_unlock() is in atomic context > and never sleeps. First, if there are new printk() callers, they could > take over the job. Second. if they are no more callers, the > current owner will release the lock after processing the existing > messages. In both situations, the current owner will not handle more > than the entire buffer. Therefore it is limited. We might argue > if it is enough. But the point is that it is limited which is > a step forward. And I think that you already agreed that this > was a step forward. yes. the question whether O(A * B) bound is good enough is still there, but in the worst case it's still a lockup, just like before [including cases of accidental hand off from non-atomic context to a atomic one]. > The chance of taking over the lock is lower when console_unlock() > owner could sleep. But then there is not a danger of a softlockup. > In each case, this patch did not make it worse. Could we agree > on this, please? yes. > All in all, this patch improved one scenario and did not make > worse another one. We know that it does not fix everything. > But it is a step forward. Could we agree on this, please? yes. it's iffy. it's a step forward when it's a step forward :) and the good old lockup/panic in other cases. IMHO. -ss ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-15 8:51 ` Petr Mladek 2018-01-15 9:48 ` Sergey Senozhatsky @ 2018-01-16 5:16 ` Sergey Senozhatsky 2018-01-16 9:08 ` Petr Mladek 1 sibling, 1 reply; 140+ messages in thread From: Sergey Senozhatsky @ 2018-01-16 5:16 UTC (permalink / raw) To: Petr Mladek, Steven Rostedt Cc: Sergey Senozhatsky, Sergey Senozhatsky, Tejun Heo, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Pavel Machek, linux-kernel On (01/15/18 09:51), Petr Mladek wrote: > On Sat 2018-01-13 16:31:00, Sergey Senozhatsky wrote: > > On (01/12/18 13:55), Petr Mladek wrote: > > [..] > > > > I'm not fixing console_unlock(), I'm fixing printk(). BTW, all my > > > > kernels are CONFIG_PREEMPT (I'm a RT guy), my mind thinks more about > > > > PREEMPT kernels than !PREEMPT ones. > > > > > > I would say that the patch improves also console_unlock() but only in > > > non-preemttive context. > > > > > > By other words, it makes console_unlock() finite in preemptible context > > > (limited by buffer size). It might still be unlimited in > > > non-preemtible context. > > > > could you elaborate a bit? > > Ah, I am sorry, I swapped the conditions. I meant that > console_unlock() is finite in non-preemptible context. by the way. just for the record, probably there is a way for us to have a task printing more than O(logbuf) even in non-preemptible context. CPU0 vprintk_emit() preempt_disable() console_unlock() { for (;;) { printk_safe_enter_irqsave() call_console_drivers(); printk_safe_exit_irqrestore() << IRQ >> dump_stack() printk()->log_store() .... printk()->log_store() << iret >> } } preempt_enable() -ss ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-16 5:16 ` Sergey Senozhatsky @ 2018-01-16 9:08 ` Petr Mladek 0 siblings, 0 replies; 140+ messages in thread From: Petr Mladek @ 2018-01-16 9:08 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Steven Rostedt, Sergey Senozhatsky, Tejun Heo, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Pavel Machek, linux-kernel On Tue 2018-01-16 14:16:22, Sergey Senozhatsky wrote: > On (01/15/18 09:51), Petr Mladek wrote: > > On Sat 2018-01-13 16:31:00, Sergey Senozhatsky wrote: > > > On (01/12/18 13:55), Petr Mladek wrote: > > > [..] > > > > > I'm not fixing console_unlock(), I'm fixing printk(). BTW, all my > > > > > kernels are CONFIG_PREEMPT (I'm a RT guy), my mind thinks more about > > > > > PREEMPT kernels than !PREEMPT ones. > > > > > > > > I would say that the patch improves also console_unlock() but only in > > > > non-preemttive context. > > > > > > > > By other words, it makes console_unlock() finite in preemptible context > > > > (limited by buffer size). It might still be unlimited in > > > > non-preemtible context. > > > > > > could you elaborate a bit? > > > > Ah, I am sorry, I swapped the conditions. I meant that > > console_unlock() is finite in non-preemptible context. > > by the way. just for the record, > > probably there is a way for us to have a task printing more than > O(logbuf) even in non-preemptible context. > > CPU0 > > vprintk_emit() > preempt_disable() > console_unlock() > { > for (;;) { > printk_safe_enter_irqsave() > call_console_drivers(); > printk_safe_exit_irqrestore() > > << IRQ >> > dump_stack() > printk()->log_store() > .... > printk()->log_store() > << iret >> > } > } > preempt_enable() Great catch! And good to know about it when designing further improvements. Best Regards, Petr ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-12 12:55 ` Petr Mladek 2018-01-13 7:31 ` Sergey Senozhatsky @ 2018-01-15 12:08 ` Steven Rostedt 2018-01-16 4:51 ` Sergey Senozhatsky 1 sibling, 1 reply; 140+ messages in thread From: Steven Rostedt @ 2018-01-15 12:08 UTC (permalink / raw) To: Petr Mladek Cc: Sergey Senozhatsky, Tejun Heo, Sergey Senozhatsky, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Pavel Machek, linux-kernel On Fri, 12 Jan 2018 13:55:37 +0100 Petr Mladek <pmladek@suse.com> wrote: > > I'm not fixing console_unlock(), I'm fixing printk(). BTW, all my > > kernels are CONFIG_PREEMPT (I'm a RT guy), my mind thinks more about > > PREEMPT kernels than !PREEMPT ones. > > I would say that the patch improves also console_unlock() but only in > non-preemttive context. > > By other words, it makes console_unlock() finite in preemptible context > (limited by buffer size). It might still be unlimited in > non-preemtible context. Since I'm worried most about printk(), I would argue to make printk console unlock always non-preempt. preempt_disable(); if (console_trylock_spinning()) console_unlock(); preempt_enable(); -- Steve ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-15 12:08 ` Steven Rostedt @ 2018-01-16 4:51 ` Sergey Senozhatsky 0 siblings, 0 replies; 140+ messages in thread From: Sergey Senozhatsky @ 2018-01-16 4:51 UTC (permalink / raw) To: Steven Rostedt Cc: Petr Mladek, Sergey Senozhatsky, Tejun Heo, Sergey Senozhatsky, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Pavel Machek, linux-kernel On (01/15/18 07:08), Steven Rostedt wrote: > On Fri, 12 Jan 2018 13:55:37 +0100 > Petr Mladek <pmladek@suse.com> wrote: > > > > I'm not fixing console_unlock(), I'm fixing printk(). BTW, all my > > > kernels are CONFIG_PREEMPT (I'm a RT guy), my mind thinks more about > > > PREEMPT kernels than !PREEMPT ones. > > > > I would say that the patch improves also console_unlock() but only in > > non-preemttive context. > > > > By other words, it makes console_unlock() finite in preemptible context > > (limited by buffer size). It might still be unlimited in > > non-preemtible context. > > Since I'm worried most about printk(), I would argue to make printk > console unlock always non-preempt. +1 // The next stop is "victims of O(logbuf) memorial" station :) -ss ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-12 12:21 ` Steven Rostedt 2018-01-12 12:55 ` Petr Mladek @ 2018-01-13 7:28 ` Sergey Senozhatsky 2018-01-15 10:17 ` Petr Mladek 2018-01-15 12:06 ` Steven Rostedt 1 sibling, 2 replies; 140+ messages in thread From: Sergey Senozhatsky @ 2018-01-13 7:28 UTC (permalink / raw) To: Steven Rostedt Cc: Sergey Senozhatsky, Petr Mladek, Tejun Heo, Sergey Senozhatsky, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Pavel Machek, linux-kernel On (01/12/18 07:21), Steven Rostedt wrote: [..] > Yep, but I'm still not convinced you are seeing an issue with a single > printk. what do you mean by this? > An OOM does not do everything in one printk, it calls hundreds. > Having hundreds of printks is an issue, especially in critical sections. unless your console_sem owner is preempted. as long as it is preempted it doesn't really matter how many times we call printk from which CPUs and from which sections, but what matters - who is going to print that all out when console_sem is running again and how much time will it take. that's what I'm saying. [..] > > with slow serial console, call_console_drivers() takes enough time to > > to make preemption of a current console_sem owner right after it irqrestore() > > highly possible; unless there is a spinning console_waiter. which easily may > > not be there; but can come in while current console_sem is preempted, why not. > > so when preempted console_sem owner comes back - it suddenly has a whole bunch > > of new messages to print and on one to hand off printing to. in a super > > imperfect and ugly world, BTW, this is how console_unlock() still can be > > O(infinite): schedule between the printed lines [even !PREEMPT kernel tries > > I'm not fixing console_unlock(), I'm fixing printk(). I know. I'm fixing console_unlock(). because console_unlock() is its own thing. > > 4) the interesting thing here is that call_console_drivers() can > > cause console_sem owner to schedule even if it has handed off the > > ownership. because waiting CPU has to spin with local IRQs disabled > > as long as call_console_drivers() prints its message. so if consoles > > are slow, then the first thing the waiter will face after it receives > > the console_sem ownership and enables the IRQs is - preemption. > > If the waiter is preempted, that means its not in a critical section. > Isn't that what you want? see below. > > so hand off is not immediate. there is a possibility of re-scheduling > > between hand off and actual printing. so that "there is always an active > > printing CPU" is not quite true. > > > > vprintk_emit() > > { > > > > console_trylock_spinning(void) > > { > > printk_safe_enter_irqsave(flags); > > while (READ_ONCE(console_waiter)) // spins as long as call_console_drivers() on other CPU > > cpu_relax(); > > printk_safe_exit_irqrestore(flags); > > ---> } > > | // preemptible up until printk_safe_enter_irqsave() in console_unlock() > > Again, this means the waiter is not in a critical section. Why do we > care? which is not what I was talking about. the point was that you said : .... and what about the : printks that haven't gotten out yet? Delay them to something else, and : if the machine were to crash in the transfer, we lost all that data. : : My method, there's really no delay between a hand off. There's always : an active CPU doing printing. It matches the current method which works : well for getting information out. A delayed approach will break that that is not true. we can have preemption "during" hand off. hand off, thus, is a "delayed approach", by definition. so if you consider the possibility of "if the machine were to crash in the transfer, we lost all that data" and if you consider this to be important [otherwise you wouldn't bring that up, would you] then the reality is that your patch has the same problem as printk_kthread. so very schematically, for hand-off it's something like if (... console_trylock_spinning()) // grabbed the ownership << ... preempted ... >> console_unlock(); for printk_kthread it's something like wake_up_process(printk_kthread); up(console_sem); in the later case we at least have console_sem unlocked. so any other CPU that might do printk() can grab the lock and emit the logbuf messages. but in case on hand-off, we have console_sem locked, so no printk() will be able to emit the messages, we need that specific task to become running. hence the following: [..] > > reverting 6b97a20d3a7909daa06625d4440c2c52d7bf08d7 may be the right > > thing after all. this was cryptic and misleading. sorry. some clarifications. what I meant was that with 6b97a20d3a7909daa06625d4440c2c52d7bf08d7 I think I badly broke printk() [some of paths]. I know what I tried to fix (and you don't have to explain to me what a lock up is) with that patch, but I don't think the patch ended up to be a clear win. a very simple explanation would be: instead of having a direct nonpreemptible path logbuf -> for(;;) call_console_drivers -> happy user we now have logbuf -> for(;;) { call_console_drivers, scheduler ... ???} -> happy user which is a big change. with a non-zero potential for regressions. and it didn't take long to find out that not all "happy users" were exactly happy with the new scheme of things. glance through Tetsuo's emails [see links in my another email], Tetsuo reported that printk can stall for minutes now. basically, the worse the system state is the lower printk throughput can be [down to zero chars in the worst case]. that's why I think that my patch was a mistake. and that's why in my out-of-tree patches I'm moving towards the non-preemptible path from logbuf through console to a happy user [just like it used to be]. but, obviously, I can't just restore preempt_disable()/preempt_enable() in vprintk_emit(). that's why I bound console_unlock() to watchdog threshold and move towards the batched non-preemptible print outs (enabling preemption and up()-ing the console_sem at the end of each print out batch). this is not super good, preemption is still here, but at least not after every line console_unlock() prints. up() console_sem also increases chances that, for instance, systemd or any other task that is sleeping in TASK_UNINTERRUPTIBLE on console_sem now has a chance to be woken up sooner (not only after we flush all pending logbuf messages and finally up() the console_sem). -ss ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-13 7:28 ` Sergey Senozhatsky @ 2018-01-15 10:17 ` Petr Mladek 2018-01-15 11:50 ` Petr Mladek 2018-01-16 5:23 ` Sergey Senozhatsky 2018-01-15 12:06 ` Steven Rostedt 1 sibling, 2 replies; 140+ messages in thread From: Petr Mladek @ 2018-01-15 10:17 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Steven Rostedt, Sergey Senozhatsky, Tejun Heo, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Pavel Machek, linux-kernel Hi Sergey, I wonder if there is still some miss understanding. Steven and me are trying to get this patch in because we believe that it is a step forward. We know that it is not perfect. But we believe that it makes things better. In particular, it limits the time spent in console_unlock() in atomic context. It does not make it worse in preemptible context. It does not block further improvements, including offloading to the kthread. We will happily discuss and review further improvements, it they prove to be necessary. The advantage of this approach is that it is incremental. It should be easier for review and analyzing possible regressions. What is the aim of your mails, please? Do you want to say that this patch might cause regressions? Or do you want to say that it does not solve all scenarios? Please, answer the above questions. I am still confused. On Sat 2018-01-13 16:28:34, Sergey Senozhatsky wrote: > On (01/12/18 07:21), Steven Rostedt wrote: > [..] > > Yep, but I'm still not convinced you are seeing an issue with a single > > printk. > > what do you mean by this? > > > An OOM does not do everything in one printk, it calls hundreds. > > Having hundreds of printks is an issue, especially in critical sections. > > unless your console_sem owner is preempted. as long as it is preempted > it doesn't really matter how many times we call printk from which CPUs > and from which sections, but what matters - who is going to print that all > out when console_sem is running again and how much time will it take. > that's what I'm saying. Yes, this is a problem. We might need to solve it. But the same problem is there even without the patch. Therefore we might solve it later. Do you agree, please? > [..] > > > with slow serial console, call_console_drivers() takes enough time to > > > to make preemption of a current console_sem owner right after it irqrestore() > > > highly possible; unless there is a spinning console_waiter. which easily may > > > not be there; but can come in while current console_sem is preempted, why not. > > > so when preempted console_sem owner comes back - it suddenly has a whole bunch > > > of new messages to print and on one to hand off printing to. in a super > > > imperfect and ugly world, BTW, this is how console_unlock() still can be > > > O(infinite): schedule between the printed lines [even !PREEMPT kernel tries > > > > I'm not fixing console_unlock(), I'm fixing printk(). > > which is not what I was talking about. the point was that you said > > > : .... and what about the > : printks that haven't gotten out yet? Delay them to something else, and > : if the machine were to crash in the transfer, we lost all that data. > : > : My method, there's really no delay between a hand off. There's always > : an active CPU doing printing. It matches the current method which works > : well for getting information out. A delayed approach will break that > > > that is not true. we can have preemption "during" hand off. hand off, > thus, is a "delayed approach", by definition. so if you consider the > possibility of "if the machine were to crash in the transfer, we lost > all that data" and if you consider this to be important [otherwise you > wouldn't bring that up, would you] then the reality is that your patch > has the same problem as printk_kthread. > > so very schematically, for hand-off it's something like > > if (... console_trylock_spinning()) // grabbed the ownership > > << ... preempted ... >> > > console_unlock(); > > > for printk_kthread it's something like > > wake_up_process(printk_kthread); > up(console_sem); Good question! Is this really the same? The console_trylock_spinning() caller will get preempted only when interrupts (timers?) still work. This is a sign that the system is still somehow living. Also this information is quite up-to-date because you checked this after a relatively short busy wait. On the other hand, wake_up_process() just puts printk_kthread into a running state. It does not check if the processes are still actively being rescheduled on the system. It might check some flags. But they might be pretty outdated when this is done after half of the watchdog limit. In each case, the preemption after console_trylock_spinning() has the same effect like preemption in console_unlock(). It is possible already now. Therefore I do not consider this as a regression. > hence the following: > > [..] > > > reverting 6b97a20d3a7909daa06625d4440c2c52d7bf08d7 may be the right > > > thing after all. > > this was cryptic and misleading. sorry. > some clarifications. > > what I meant was that with 6b97a20d3a7909daa06625d4440c2c52d7bf08d7 > I think I badly broke printk() [some of paths]. I know what I tried > to fix (and you don't have to explain to me what a lock up is) with > that patch, but I don't think the patch ended up to be a clear win. > a very simple explanation would be: > > instead of having a direct nonpreemptible path > > logbuf -> for(;;) call_console_drivers -> happy user > > we now have > > logbuf -> for(;;) { call_console_drivers, scheduler ... ???} -> happy user > > which is a big change. with a non-zero potential for regressions. > and it didn't take long to find out that not all "happy users" were > exactly happy with the new scheme of things. glance through Tetsuo's > emails [see links in my another email], Tetsuo reported that printk can > stall for minutes now. basically, the worse the system state is the lower > printk throughput can be [down to zero chars in the worst case]. that's > why I think that my patch was a mistake. and that's why in my out-of-tree > patches I'm moving towards the non-preemptible path from logbuf through > console to a happy user [just like it used to be]. but, obviously, I can't > just restore preempt_disable()/preempt_enable() in vprintk_emit(). that's > why I bound console_unlock() to watchdog threshold and move towards the > batched non-preemptible print outs (enabling preemption and up()-ing the > console_sem at the end of each print out batch). this is not super good, > preemption is still here, but at least not after every line console_unlock() > prints. up() console_sem also increases chances that, for instance, systemd > or any other task that is sleeping in TASK_UNINTERRUPTIBLE on console_sem > now has a chance to be woken up sooner (not only after we flush all pending > logbuf messages and finally up() the console_sem). I see your point. But this is an orthogonal problem. It is more about loosing messages because console_unlock() is slow when sleeping. This patch is about limiting time spent in console_unlock() in atomic context. If you want to revert the above mentioned commit, please send a patch so that we could discuss this separately. Best Regards, Petr PS: Sergey, you have many good points. The printk-stuff is very complex and we could spend years discussing the perfect solution. But I am never sure if you discuss this in this thread because this patch might cause regression or because it does not address all the issues. Could we please make it more simple? If you believe that this patch might cause regression than please say this clearly. You actually mentioned the word regression few times. I am not sure if we managed to persuade you about the opposite. If you think that this patch is not good enough and not worth merging upstream, please state this clearly as well. If you think that this patch does not address all problems, please send further improvements on top of it so that we could discuss this. If you want to discuss the problems in advance, please open another thread. IMHO, this thread brought many ideas for the perfect solution but it is already too scattered. Best Regards, Petr ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-15 10:17 ` Petr Mladek @ 2018-01-15 11:50 ` Petr Mladek 2018-01-16 6:10 ` Sergey Senozhatsky 2018-01-16 5:23 ` Sergey Senozhatsky 1 sibling, 1 reply; 140+ messages in thread From: Petr Mladek @ 2018-01-15 11:50 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Steven Rostedt, Sergey Senozhatsky, Tejun Heo, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Pavel Machek, linux-kernel On Mon 2018-01-15 11:17:43, Petr Mladek wrote: > PS: Sergey, you have many good points. The printk-stuff is very > complex and we could spend years discussing the perfect solution. BTW: One solution that comes to my mind is based on ideas already mentioned in this thread: void console_unlock(void) { disable_preemtion(); while(pending_message) { call_console_drivers(); if (too_long_here() && current != printk_kthread) { wake_up_process(printk_kthread()) } enable_preemtion(); } bool too_long_here(void) { return should_resched(); or return spent_here() > 1 / HZ / 2; or what ever we agree on } int printk_kthread_func(void *data) { while(1) { if (!pending_messaged) schedule(); if (console_trylock_spinning()) console_unlock(); cond_resched(); } } It means that console_unlock() will aggressively push messages with disabled preemption. It will wake up printk_kthread when it is pushing too long. The printk_kthread would try to steal the lock and take over the job. If the system is in reasonable state, printk_kthread should succeed and avoid softlockup. The offload should be more safe than a pure wake_up_process(). If printk_kthread is not able to take over the job, it might suggest that the offload is not safe and the softlockup is inevitable. One question is how to avoid softlockup when console_unlock() is called from printk_kthread. I think that printk_kthread should release console_lock and call cond_resched from time to time. It means that the printing will be less aggressive but anyone could continue flushing the console. If there are no new messages, it is probably acceptable to be less aggressive with flushing the messages. Anyway, this should be more safe than a direct offload if we agree that getting the messages out is more important than a possible softlockup. If this is not enough, I would start thinking about throttling writers. Finally, this is all a future work that can be done and discussed later. Best Regards, Petr ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-15 11:50 ` Petr Mladek @ 2018-01-16 6:10 ` Sergey Senozhatsky 2018-01-16 9:36 ` Petr Mladek 2018-01-16 16:06 ` Steven Rostedt 0 siblings, 2 replies; 140+ messages in thread From: Sergey Senozhatsky @ 2018-01-16 6:10 UTC (permalink / raw) To: Petr Mladek Cc: Sergey Senozhatsky, Steven Rostedt, Sergey Senozhatsky, Tejun Heo, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Pavel Machek, linux-kernel Hi, On (01/15/18 12:50), Petr Mladek wrote: > On Mon 2018-01-15 11:17:43, Petr Mladek wrote: > > PS: Sergey, you have many good points. The printk-stuff is very > > complex and we could spend years discussing the perfect solution. > > BTW: One solution that comes to my mind is based on ideas > already mentioned in this thread: > > void console_unlock(void) > { > disable_preemtion(); > > while(pending_message) { > > call_console_drivers(); > > if (too_long_here() && current != printk_kthread) { > wake_up_process(printk_kthread()) > > } > > enable_preemtion(); > } unfortunately disabling preemtion in console_unlock() is a bit dangerous :( we have paths that call console_unlock() exactly to flush everything (not only new pending messages, but everything) that is in logbuf and we cannot return from console_unlock() preliminary in that case. > bool too_long_here(void) > { > return should_resched(); > or > return spent_here() > 1 / HZ / 2; > or > what ever we agree on > } > > > int printk_kthread_func(void *data) > { > while(1) { > if (!pending_messaged) > schedule(); > > if (console_trylock_spinning()) > console_unlock(); > > cond_resched(); > } > } overall that's very close to what I have in one of my private branches. console_trylock_spinning() for some reason does not perform really well on my made-up internal printk torture tests. it seems that I have a much better stability (no lockups and so on) when I also let printk_kthread to sleep on console_sem(). but I will look further. -ss ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-16 6:10 ` Sergey Senozhatsky @ 2018-01-16 9:36 ` Petr Mladek 2018-01-16 10:10 ` Sergey Senozhatsky 2018-01-16 16:06 ` Steven Rostedt 1 sibling, 1 reply; 140+ messages in thread From: Petr Mladek @ 2018-01-16 9:36 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Sergey Senozhatsky, Steven Rostedt, Tejun Heo, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Pavel Machek, linux-kernel On Tue 2018-01-16 15:10:13, Sergey Senozhatsky wrote: > Hi, > > On (01/15/18 12:50), Petr Mladek wrote: > > On Mon 2018-01-15 11:17:43, Petr Mladek wrote: > > > PS: Sergey, you have many good points. The printk-stuff is very > > > complex and we could spend years discussing the perfect solution. > > > > BTW: One solution that comes to my mind is based on ideas > > already mentioned in this thread: > > > > void console_unlock(void) > > { > > disable_preemtion(); > > > > while(pending_message) { > > > > call_console_drivers(); > > > > if (too_long_here() && current != printk_kthread) { > > wake_up_process(printk_kthread()) > > > > } > > > > enable_preemtion(); > > } > > unfortunately disabling preemtion in console_unlock() is a bit > dangerous :( we have paths that call console_unlock() exactly > to flush everything (not only new pending messages, but everything) > that is in logbuf and we cannot return from console_unlock() > preliminary in that case. You are right. Just to be sure. Are you talking about replaying the entire log when a new console is registered? Or do you know about more paths? If I get it correctly, we allow to hand off the lock even when replying the entire log. But you are right that we should enable preemption in this case because there are many messages even without printk() activity. IMHO, the best solution would be to reply the log in a separate process asynchronously and do not block existing consoles in the meantime. But I am not sure if it is worth the complexity. Anyway, it is a future work. > > bool too_long_here(void) > > { > > return should_resched(); > > or > > return spent_here() > 1 / HZ / 2; > > or > > what ever we agree on > > } > > > > > > int printk_kthread_func(void *data) > > { > > while(1) { > > if (!pending_messaged) > > schedule(); > > > > if (console_trylock_spinning()) > > console_unlock(); > > > > cond_resched(); > > } > > } > > overall that's very close to what I have in one of my private branches. > console_trylock_spinning() for some reason does not perform really > well on my made-up internal printk torture tests. it seems that I > have a much better stability (no lockups and so on) when I also let > printk_kthread to sleep on console_sem(). but I will look further. I believe that it is not trivial. console_trylock_spinning() is tricky and the timing is important. For example, it might be tricky if a torture test affects the normal workflow by many interrupts. We might need to call even more console_unlock() code with spinning enabled to improve the success ratio. Another problem is that the kthread must be scheduled on another CPU. And so on. I believe that there are many more problems and areas for improvement. Best Regards, Petr ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-16 9:36 ` Petr Mladek @ 2018-01-16 10:10 ` Sergey Senozhatsky 0 siblings, 0 replies; 140+ messages in thread From: Sergey Senozhatsky @ 2018-01-16 10:10 UTC (permalink / raw) To: Petr Mladek Cc: Sergey Senozhatsky, Sergey Senozhatsky, Steven Rostedt, Tejun Heo, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Pavel Machek, linux-kernel On (01/16/18 10:36), Petr Mladek wrote: [..] > > unfortunately disabling preemtion in console_unlock() is a bit > > dangerous :( we have paths that call console_unlock() exactly > > to flush everything (not only new pending messages, but everything) > > that is in logbuf and we cannot return from console_unlock() > > preliminary in that case. > > You are right. Just to be sure. Are you talking about replaying > the entire log when a new console is registered? Or do you know > about more paths? to the best of my knowledge CON_PRINTBUFFER is the only thing that explicitly states "I want everything what's in logbuf, even if it has been already printed on other consoles" the rest want to have only pending messages, so we can offload from there. CON_PRINTBUFFER registration can happen any time. e.g. via modprobe netconsole. we can be up and running for some time when netconsole joins in, so that CON_PRINTBUFFER thing can be painful. > If I get it correctly, we allow to hand off the lock even when > replying the entire log. But you are right that we should > enable preemption in this case because there are many messages > even without printk() activity. > IMHO, the best solution would be to reply the log in a > separate process asynchronously and do not block existing > consoles in the meantime. But I am not sure if it is worth > the complexity. Anyway, it is a future work. [..] > > > int printk_kthread_func(void *data) > > > { > > > while(1) { > > > if (!pending_messaged) > > > schedule(); > > > > > > if (console_trylock_spinning()) > > > console_unlock(); > > > > > > cond_resched(); > > > } > > > } > > > > overall that's very close to what I have in one of my private branches. > > console_trylock_spinning() for some reason does not perform really > > well on my made-up internal printk torture tests. it seems that I > > have a much better stability (no lockups and so on) when I also let > > printk_kthread to sleep on console_sem(). but I will look further. > > I believe that it is not trivial. console_trylock_spinning() is > tricky and the timing is important. yes, timing seems to be very important. *as far as I can see from the traces on my printk torture tests* > For example, it might be tricky if a torture test affects the normal > workflow by many interrupts. We might need to call even more > console_unlock() code with spinning enabled to improve the success > ratio. Another problem is that the kthread must be scheduled on > another CPU. yes, I always schedule it on another CPU [if any]. > And so on. I believe that there are many more problems and areas > for improvement. right. -ss ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-16 6:10 ` Sergey Senozhatsky 2018-01-16 9:36 ` Petr Mladek @ 2018-01-16 16:06 ` Steven Rostedt 1 sibling, 0 replies; 140+ messages in thread From: Steven Rostedt @ 2018-01-16 16:06 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Petr Mladek, Sergey Senozhatsky, Tejun Heo, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Pavel Machek, linux-kernel On Tue, 16 Jan 2018 15:10:13 +0900 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> wrote: > overall that's very close to what I have in one of my private branches. > console_trylock_spinning() for some reason does not perform really > well on my made-up internal printk torture tests. it seems that I One thing I noticed in my test with the module that does printks on all cpus, was that the patch spreads out the processing of the consoles. Before my patch, one printk user would be doing all the work, and all the other printks only had to load their data into the logbuf then exit. The majority of printks took a few microseconds, which looks great if you ignore the one worker that is taking milliseconds to complete. After my patch, since a printk that comes in while another one was running would block, then it would start printing, it did lengthen the time for individual printks to finish. Worst case it would double the time to do printk. But it removed the burden of a single printk doing all the work for all new printks that came in. In other words, I would expect this to make printk on average slower. But no longer unlimited. -- Steve > have a much better stability (no lockups and so on) when I also let > printk_kthread to sleep on console_sem(). but I will look further. ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-15 10:17 ` Petr Mladek 2018-01-15 11:50 ` Petr Mladek @ 2018-01-16 5:23 ` Sergey Senozhatsky 1 sibling, 0 replies; 140+ messages in thread From: Sergey Senozhatsky @ 2018-01-16 5:23 UTC (permalink / raw) To: Petr Mladek Cc: Sergey Senozhatsky, Steven Rostedt, Sergey Senozhatsky, Tejun Heo, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Pavel Machek, linux-kernel Hi, On (01/15/18 11:17), Petr Mladek wrote: > Hi Sergey, > > I wonder if there is still some miss understanding. > > Steven and me are trying to get this patch in because we believe > that it is a step forward. We know that it is not perfect. But > we believe that it makes things better. In particular, it limits > the time spent in console_unlock() in atomic context. It does > not make it worse in preemptible context. > > It does not block further improvements, including offloading > to the kthread. We will happily discuss and review further > improvements, it they prove to be necessary. > > The advantage of this approach is that it is incremental. It should > be easier for review and analyzing possible regressions. > > What is the aim of your mails, please? > Do you want to say that this patch might cause regressions? > Or do you want to say that it does not solve all scenarios? > > Please, answer the above questions. I am still confused. I ACK-ed the patch set, given that I hope that we at least will do (a) a) remove preemption out of printk()->user critical path --- b) the next thing would be - O(logbuf) is not a good boundary c) the thing after that would be to - O(logbuf) boundary can be broken in both preemptible and non-preemptible contexts. but (b) and (c) can wait. -ss ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-13 7:28 ` Sergey Senozhatsky 2018-01-15 10:17 ` Petr Mladek @ 2018-01-15 12:06 ` Steven Rostedt 2018-01-15 14:45 ` Petr Mladek 2018-01-16 1:46 ` Sergey Senozhatsky 1 sibling, 2 replies; 140+ messages in thread From: Steven Rostedt @ 2018-01-15 12:06 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Sergey Senozhatsky, Petr Mladek, Tejun Heo, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Pavel Machek, linux-kernel On Sat, 13 Jan 2018 16:28:34 +0900 Sergey Senozhatsky <sergey.senozhatsky@gmail.com> wrote: > On (01/12/18 07:21), Steven Rostedt wrote: > [..] > > Yep, but I'm still not convinced you are seeing an issue with a single > > printk. > > what do you mean by this? I'm not sure your issues happen because a single printk is locked up, but you have many printks in one area. > > > An OOM does not do everything in one printk, it calls hundreds. > > Having hundreds of printks is an issue, especially in critical sections. > > unless your console_sem owner is preempted. as long as it is preempted > it doesn't really matter how many times we call printk from which CPUs > and from which sections, but what matters - who is going to print that all > out when console_sem is running again and how much time will it take. > that's what I'm saying. OK, if this is an issue, then we could do: preempt_disable(); if (console_trylock_spinning()) console_unlock(); preempt_enable(); Which would prevent any printks from being preempted, but allow for other console_lock owners to be so. > > [..] > > > with slow serial console, call_console_drivers() takes enough time to > > > to make preemption of a current console_sem owner right after it irqrestore() > > > highly possible; unless there is a spinning console_waiter. which easily may > > > not be there; but can come in while current console_sem is preempted, why not. > > > so when preempted console_sem owner comes back - it suddenly has a whole bunch > > > of new messages to print and on one to hand off printing to. in a super > > > imperfect and ugly world, BTW, this is how console_unlock() still can be > > > O(infinite): schedule between the printed lines [even !PREEMPT kernel tries > > > > I'm not fixing console_unlock(), I'm fixing printk(). > > I know. I'm fixing console_unlock(). because console_unlock() is its own > thing. > > > > 4) the interesting thing here is that call_console_drivers() can > > > cause console_sem owner to schedule even if it has handed off the > > > ownership. because waiting CPU has to spin with local IRQs disabled > > > as long as call_console_drivers() prints its message. so if consoles > > > are slow, then the first thing the waiter will face after it receives > > > the console_sem ownership and enables the IRQs is - preemption. > > > > If the waiter is preempted, that means its not in a critical section. > > Isn't that what you want? > > see below. > > > > so hand off is not immediate. there is a possibility of re-scheduling > > > between hand off and actual printing. so that "there is always an active > > > printing CPU" is not quite true. > > > > > > vprintk_emit() > > > { > > > > > > console_trylock_spinning(void) > > > { > > > printk_safe_enter_irqsave(flags); > > > while (READ_ONCE(console_waiter)) // spins as long as call_console_drivers() on other CPU > > > cpu_relax(); > > > printk_safe_exit_irqrestore(flags); > > > ---> } > > > | // preemptible up until printk_safe_enter_irqsave() in console_unlock() > > > > Again, this means the waiter is not in a critical section. Why do we > > care? > > which is not what I was talking about. the point was that you said And would be fixed with the preempt_disable() I added above. > > > : .... and what about the > : printks that haven't gotten out yet? Delay them to something else, and > : if the machine were to crash in the transfer, we lost all that data. > : > : My method, there's really no delay between a hand off. There's always > : an active CPU doing printing. It matches the current method which works > : well for getting information out. A delayed approach will break that > > > that is not true. we can have preemption "during" hand off. hand off, > thus, is a "delayed approach", by definition. so if you consider the > possibility of "if the machine were to crash in the transfer, we lost > all that data" and if you consider this to be important [otherwise you > wouldn't bring that up, would you] then the reality is that your patch > has the same problem as printk_kthread. With the preempt_disable() there really isn't a delay. I agree, we shouldn't let printk preempt (unless we have CONFIG_PREEMPT_RT enabled, but that's another story). > > so very schematically, for hand-off it's something like > > if (... console_trylock_spinning()) // grabbed the ownership > > << ... preempted ... >> > > console_unlock(); Which I think we should stop, with the preempt_disable(). > > > for printk_kthread it's something like > > wake_up_process(printk_kthread); > up(console_sem); > > > in the later case we at least have console_sem unlocked. so any other CPU > that might do printk() can grab the lock and emit the logbuf messages. but > in case on hand-off, we have console_sem locked, so no printk() will be > able to emit the messages, we need that specific task to become running. > > > hence the following: > > [..] > > > reverting 6b97a20d3a7909daa06625d4440c2c52d7bf08d7 may be the right > > > thing after all. > > this was cryptic and misleading. sorry. > some clarifications. > > what I meant was that with 6b97a20d3a7909daa06625d4440c2c52d7bf08d7 > I think I badly broke printk() [some of paths]. I know what I tried I think adding the preempt_disable() would fix printk() but let non printk console_unlock() still preempt. > to fix (and you don't have to explain to me what a lock up is) with > that patch, but I don't think the patch ended up to be a clear win. > a very simple explanation would be: > > instead of having a direct nonpreemptible path > > logbuf -> for(;;) call_console_drivers -> happy user > > we now have > > logbuf -> for(;;) { call_console_drivers, scheduler ... ???} -> happy user > > which is a big change. with a non-zero potential for regressions. > and it didn't take long to find out that not all "happy users" were > exactly happy with the new scheme of things. glance through Tetsuo's > emails [see links in my another email], Tetsuo reported that printk can > stall for minutes now. basically, the worse the system state is the lower > printk throughput can be [down to zero chars in the worst case]. that's > why I think that my patch was a mistake. and that's why in my out-of-tree > patches I'm moving towards the non-preemptible path from logbuf through > console to a happy user [just like it used to be]. but, obviously, I can't > just restore preempt_disable()/preempt_enable() in vprintk_emit(). that's > why I bound console_unlock() to watchdog threshold and move towards the > batched non-preemptible print outs (enabling preemption and up()-ing the > console_sem at the end of each print out batch). this is not super good, > preemption is still here, but at least not after every line console_unlock() > prints. up() console_sem also increases chances that, for instance, systemd > or any other task that is sleeping in TASK_UNINTERRUPTIBLE on console_sem > now has a chance to be woken up sooner (not only after we flush all pending > logbuf messages and finally up() the console_sem). I rather try simpler approaches first (like adding the preempt_disable() on top of my patch) than an elaborate scheme of printk_kthreads. -- Steve ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-15 12:06 ` Steven Rostedt @ 2018-01-15 14:45 ` Petr Mladek 2018-01-16 2:23 ` Sergey Senozhatsky 2018-01-16 1:46 ` Sergey Senozhatsky 1 sibling, 1 reply; 140+ messages in thread From: Petr Mladek @ 2018-01-15 14:45 UTC (permalink / raw) To: Steven Rostedt Cc: Sergey Senozhatsky, Sergey Senozhatsky, Tejun Heo, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Pavel Machek, linux-kernel On Mon 2018-01-15 07:06:37, Steven Rostedt wrote: > On Sat, 13 Jan 2018 16:28:34 +0900 > Sergey Senozhatsky <sergey.senozhatsky@gmail.com> wrote: > > On (01/12/18 07:21), Steven Rostedt wrote: > > > > > An OOM does not do everything in one printk, it calls hundreds. > > > Having hundreds of printks is an issue, especially in critical sections. > > > > unless your console_sem owner is preempted. as long as it is preempted > > it doesn't really matter how many times we call printk from which CPUs > > and from which sections, but what matters - who is going to print that all > > out when console_sem is running again and how much time will it take. > > that's what I'm saying. > > OK, if this is an issue, then we could do: > > preempt_disable(); > if (console_trylock_spinning()) > console_unlock(); > preempt_enable(); > > Which would prevent any printks from being preempted, but allow for > other console_lock owners to be so. [...] > > : .... and what about the > > : printks that haven't gotten out yet? Delay them to something else, and > > : if the machine were to crash in the transfer, we lost all that data. > > : > > : My method, there's really no delay between a hand off. There's always > > : an active CPU doing printing. It matches the current method which works > > : well for getting information out. A delayed approach will break that > > > > > > that is not true. we can have preemption "during" hand off. hand off, > > thus, is a "delayed approach", by definition. so if you consider the > > possibility of "if the machine were to crash in the transfer, we lost > > all that data" and if you consider this to be important [otherwise you > > wouldn't bring that up, would you] then the reality is that your patch > > has the same problem as printk_kthread. > > With the preempt_disable() there really isn't a delay. I agree, we > shouldn't let printk preempt (unless we have CONFIG_PREEMPT_RT enabled, > but that's another story). > > > > > so very schematically, for hand-off it's something like > > > > if (... console_trylock_spinning()) // grabbed the ownership > > > > << ... preempted ... >> > > > > console_unlock(); > > Which I think we should stop, with the preempt_disable(). Adding the preempt_disable() basically means to revert the already mentioned commit 6b97a20d3a7909daa06625 ("printk: set may_schedule for some of console_trylock() callers"). I originally wanted to solve this separately to make it easier. But the change looks fine to me. Therefore we reached a mutual agreement. Sergey, do you want to send a patch or should I just put it at the end of this patchset? > > for printk_kthread it's something like > > > > wake_up_process(printk_kthread); > > up(console_sem); > > > > > > in the later case we at least have console_sem unlocked. so any other CPU > > that might do printk() can grab the lock and emit the logbuf messages. but > > in case on hand-off, we have console_sem locked, so no printk() will be > > able to emit the messages, we need that specific task to become running. > > > > > > hence the following: > > > > [..] > > > > reverting 6b97a20d3a7909daa06625d4440c2c52d7bf08d7 may be the right > > > > thing after all. > > > > this was cryptic and misleading. sorry. > > some clarifications. > > > > what I meant was that with 6b97a20d3a7909daa06625d4440c2c52d7bf08d7 > > I think I badly broke printk() [some of paths]. I know what I tried > > I think adding the preempt_disable() would fix printk() but let non > printk console_unlock() still preempt. I would personally remove cond_resched() from console_unlock() completely. Sleeping in console_unlock() increases the chance that more messages would need to be handled. And more importantly it reduces the chance of a successful handover. As a result, the caller might spend there very long time, it might be getting increasingly far behind. There is higher risk of lost messages. Also the eventual taker might have too much to proceed in preemption disabled context. Removing cond_resched() is in sync with printk() priorities. The highest one is to get the messages out. Finally, removing cond_resched() should make the behavior more predictable (never preempted), same in all situations (called from printk() or other locations) => easier to analyze problems and maintain. Best Regards, Petr ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-15 14:45 ` Petr Mladek @ 2018-01-16 2:23 ` Sergey Senozhatsky 2018-01-16 4:47 ` Sergey Senozhatsky 2018-01-16 10:13 ` Petr Mladek 0 siblings, 2 replies; 140+ messages in thread From: Sergey Senozhatsky @ 2018-01-16 2:23 UTC (permalink / raw) To: Petr Mladek Cc: Steven Rostedt, Sergey Senozhatsky, Sergey Senozhatsky, Tejun Heo, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Pavel Machek, linux-kernel On (01/15/18 15:45), Petr Mladek wrote: [..] > > With the preempt_disable() there really isn't a delay. I agree, we > > shouldn't let printk preempt (unless we have CONFIG_PREEMPT_RT enabled, > > but that's another story). > > > > > > > > so very schematically, for hand-off it's something like > > > > > > if (... console_trylock_spinning()) // grabbed the ownership > > > > > > << ... preempted ... >> > > > > > > console_unlock(); > > > > Which I think we should stop, with the preempt_disable(). > > Adding the preempt_disable() basically means to revert the already > mentioned commit 6b97a20d3a7909daa06625 ("printk: set may_schedule > for some of console_trylock() callers"). > > I originally wanted to solve this separately to make it easier. But > the change looks fine to me. Therefore we reached a mutual agreement. > Sergey, do you want to send a patch or should I just put it at > the end of this patchset? you can add the patch. [..] > > I think adding the preempt_disable() would fix printk() but let non > > printk console_unlock() still preempt. > > I would personally remove cond_resched() from console_unlock() > completely. hmm, not so sure. I think it's there for !PREEMPT systems which have to print a lot of messages. the case I'm speaking about in particular is when we register a CON_PRINTBUFFER console and need to console_unlock() (flush) all of the messages we currently have in the logbuf. we better have that cond_resched() there, I think. > Sleeping in console_unlock() increases the chance that more messages > would need to be handled. And more importantly it reduces the chance > of a successful handover. > > As a result, the caller might spend there very long time, it might > be getting increasingly far behind. There is higher risk of lost > messages. Also the eventual taker might have too much to proceed > in preemption disabled context. yes. > Removing cond_resched() is in sync with printk() priorities. hmm, not sure. we have sleeping console_lock()->console_unlock() path for PREEMPT kernels, that cond_resched() makes the !PREEMPT kernels to have the same sleeping console_lock()->console_unlock(). printk()->console_unlock() seems to be a pretty independent thing, unfortunately (!), yet sleeping console_lock()->console_unlock() messes up with it a lot. > The highest one is to get the messages out. > > Finally, removing cond_resched() should make the behavior more > predictable (never preempted) but we are always preempted in PREEMPT kernels when the current console_sem owner acquired the lock via console_lock(), not via console_trylock(). cond_resched() does the same, but for !PREEMPT. -ss ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-16 2:23 ` Sergey Senozhatsky @ 2018-01-16 4:47 ` Sergey Senozhatsky 2018-01-16 10:19 ` Petr Mladek 2018-01-16 15:45 ` Steven Rostedt 2018-01-16 10:13 ` Petr Mladek 1 sibling, 2 replies; 140+ messages in thread From: Sergey Senozhatsky @ 2018-01-16 4:47 UTC (permalink / raw) To: Petr Mladek, Steven Rostedt, Tetsuo Handa Cc: Sergey Senozhatsky, Tejun Heo, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, rostedt, Byungchul Park, Pavel Machek, linux-kernel, Sergey Senozhatsky On (01/16/18 11:23), Sergey Senozhatsky wrote: [..] > > Adding the preempt_disable() basically means to revert the already > > mentioned commit 6b97a20d3a7909daa06625 ("printk: set may_schedule > > for some of console_trylock() callers"). > > > > I originally wanted to solve this separately to make it easier. But > > the change looks fine to me. Therefore we reached a mutual agreement. > > Sergey, do you want to send a patch or should I just put it at > > the end of this patchset? > > you can add the patch. if you don't mind, let me fix the thing that I broke. that would be responsible. I believe I also must say the following: Tetsuo, many thanks for reporting the issues for song long, and sorry that it took quite a while to revert that change. 8<==== From: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Subject: [PATCH] printk: never set console_may_schedule in console_trylock() This patch, basically, reverts commit 6b97a20d3a79 ("printk: set may_schedule for some of console_trylock() callers"). That commit was a mistake, it introduced a big dependency on the scheduler, by enabling preemption under console_sem in printk()->console_unlock() path, which is rather too critical. The patch did not significantly reduce the possibilities of printk() lockups, but made it possible to stall printk(), as has been reported by Tetsuo Handa [1]. Another issues is that preemption under console_sem also messes up with Steven Rostedt's hand off scheme, by making it possible to sleep with console_sem both in console_unlock() and in vprintk_emit(), after acquiring the console_sem ownership (anywhere between printk_safe_exit_irqrestore() in console_trylock_spinning() and printk_safe_enter_irqsave() in console_unlock()). This makes hand off less likely and, at the same time, may result in a significant amount of pending logbuf messages. Preempted console_sem owner makes it impossible for other CPUs to emit logbuf messages, but does not make it impossible for other CPUs to append new messages to the logbuf. Reinstate the old behavior and make printk() non-preemptible. Should any printk() lockup reports arrive they must be handled in a different way. [1] https://marc.info/?l=linux-mm&m=145692016122716 Fixes: 6b97a20d3a79 ("printk: set may_schedule for some of console_trylock() callers") Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> --- kernel/printk/printk.c | 22 ++++++++-------------- 1 file changed, 8 insertions(+), 14 deletions(-) diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index ffe05024c622..9cb943c90d98 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -1895,6 +1895,12 @@ asmlinkage int vprintk_emit(int facility, int level, /* If called from the scheduler, we can not call up(). */ if (!in_sched) { + /* + * Disable preemption to avoid being preempted while holding + * console_sem which would prevent anyone from printing to + * console + */ + preempt_disable(); /* * Try to acquire and then immediately release the console * semaphore. The release will print out buffers and wake up @@ -1902,6 +1908,7 @@ asmlinkage int vprintk_emit(int facility, int level, */ if (console_trylock_spinning()) console_unlock(); + preempt_enable(); } return printed_len; @@ -2229,20 +2236,7 @@ int console_trylock(void) return 0; } console_locked = 1; - /* - * When PREEMPT_COUNT disabled we can't reliably detect if it's - * safe to schedule (e.g. calling printk while holding a spin_lock), - * because preempt_disable()/preempt_enable() are just barriers there - * and preempt_count() is always 0. - * - * RCU read sections have a separate preemption counter when - * PREEMPT_RCU enabled thus we must take extra care and check - * rcu_preempt_depth(), otherwise RCU read sections modify - * preempt_count(). - */ - console_may_schedule = !oops_in_progress && - preemptible() && - !rcu_preempt_depth(); + console_may_schedule = 0; return 1; } EXPORT_SYMBOL(console_trylock); -- 2.15.1 ^ permalink raw reply related [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-16 4:47 ` Sergey Senozhatsky @ 2018-01-16 10:19 ` Petr Mladek 2018-01-17 2:24 ` Sergey Senozhatsky 2018-01-16 15:45 ` Steven Rostedt 1 sibling, 1 reply; 140+ messages in thread From: Petr Mladek @ 2018-01-16 10:19 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Steven Rostedt, Tetsuo Handa, Sergey Senozhatsky, Tejun Heo, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, rostedt, Byungchul Park, Pavel Machek, linux-kernel On Tue 2018-01-16 13:47:16, Sergey Senozhatsky wrote: > if you don't mind, let me fix the thing that I broke. > that would be responsible. I believe I also must say the following: > Tetsuo, many thanks for reporting the issues for song long, and > sorry that it took quite a while to revert that change. > > 8<==== > > From: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> > Subject: [PATCH] printk: never set console_may_schedule in console_trylock() > > This patch, basically, reverts commit 6b97a20d3a79 ("printk: > set may_schedule for some of console_trylock() callers"). > That commit was a mistake, it introduced a big dependency > on the scheduler, by enabling preemption under console_sem > in printk()->console_unlock() path, which is rather too > critical. The patch did not significantly reduce the > possibilities of printk() lockups, but made it possible to > stall printk(), as has been reported by Tetsuo Handa [1]. > > Another issues is that preemption under console_sem also > messes up with Steven Rostedt's hand off scheme, by making > it possible to sleep with console_sem both in console_unlock() > and in vprintk_emit(), after acquiring the console_sem > ownership (anywhere between printk_safe_exit_irqrestore() in > console_trylock_spinning() and printk_safe_enter_irqsave() > in console_unlock()). This makes hand off less likely and, > at the same time, may result in a significant amount of > pending logbuf messages. Preempted console_sem owner makes > it impossible for other CPUs to emit logbuf messages, but > does not make it impossible for other CPUs to append new > messages to the logbuf. > > Reinstate the old behavior and make printk() non-preemptible. > Should any printk() lockup reports arrive they must be handled > in a different way. > > [1] https://marc.info/?l=linux-mm&m=145692016122716 > Fixes: 6b97a20d3a79 ("printk: set may_schedule for some of console_trylock() callers") > Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> > Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> IMHO, this is a step in the right direction. Reviewed-by: Petr Mladek <pmladek@suse.com> I'll wait for Steven's review and push this into printk.git. I'll also add your Acks for the other patches. Thanks for the patch and the various observations. Best Regards, Petr ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-16 10:19 ` Petr Mladek @ 2018-01-17 2:24 ` Sergey Senozhatsky 0 siblings, 0 replies; 140+ messages in thread From: Sergey Senozhatsky @ 2018-01-17 2:24 UTC (permalink / raw) To: Petr Mladek Cc: Sergey Senozhatsky, Steven Rostedt, Tetsuo Handa, Sergey Senozhatsky, Tejun Heo, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, rostedt, Byungchul Park, Pavel Machek, linux-kernel On (01/16/18 11:19), Petr Mladek wrote: [..] > > [1] https://marc.info/?l=linux-mm&m=145692016122716 > > Fixes: 6b97a20d3a79 ("printk: set may_schedule for some of console_trylock() callers") > > Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> > > Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> > > IMHO, this is a step in the right direction. > > Reviewed-by: Petr Mladek <pmladek@suse.com> > > I'll wait for Steven's review and push this into printk.git. > I'll also add your Acks for the other patches. > > Thanks for the patch and the various observations. thanks! a side note, our console output is still largely preemptible. a typical system acquires console_sem via console_lock() all the time, so we still can have "where is my printk output?" cases. for instance, my IDLE PREEMPT x86 box, has the following stats uptime 15 min # of console_lock() calls: 10981 // can sleep under console_sem # of vprintk_emit() calls: 825 // cannot sleep under console_sem -ss ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-16 4:47 ` Sergey Senozhatsky 2018-01-16 10:19 ` Petr Mladek @ 2018-01-16 15:45 ` Steven Rostedt 2018-01-17 2:18 ` Sergey Senozhatsky 1 sibling, 1 reply; 140+ messages in thread From: Steven Rostedt @ 2018-01-16 15:45 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Petr Mladek, Tetsuo Handa, Sergey Senozhatsky, Tejun Heo, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, rostedt, Byungchul Park, Pavel Machek, linux-kernel On Tue, 16 Jan 2018 13:47:16 +0900 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> wrote: > From: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> > Subject: [PATCH] printk: never set console_may_schedule in console_trylock() > > This patch, basically, reverts commit 6b97a20d3a79 ("printk: > set may_schedule for some of console_trylock() callers"). > That commit was a mistake, it introduced a big dependency > on the scheduler, by enabling preemption under console_sem > in printk()->console_unlock() path, which is rather too > critical. The patch did not significantly reduce the > possibilities of printk() lockups, but made it possible to > stall printk(), as has been reported by Tetsuo Handa [1]. > > Another issues is that preemption under console_sem also > messes up with Steven Rostedt's hand off scheme, by making > it possible to sleep with console_sem both in console_unlock() > and in vprintk_emit(), after acquiring the console_sem > ownership (anywhere between printk_safe_exit_irqrestore() in > console_trylock_spinning() and printk_safe_enter_irqsave() > in console_unlock()). This makes hand off less likely and, > at the same time, may result in a significant amount of > pending logbuf messages. Preempted console_sem owner makes > it impossible for other CPUs to emit logbuf messages, but > does not make it impossible for other CPUs to append new > messages to the logbuf. > > Reinstate the old behavior and make printk() non-preemptible. > Should any printk() lockup reports arrive they must be handled > in a different way. > > [1] https://marc.info/?l=linux-mm&m=145692016122716 Especially since Konstantin is working on pulling in all LKML archives, the above should be denoted as: Link: http://lkml.kernel.org/r/201603022101.CAH73907.OVOOMFHFFtQJSL%20()%20I-love%20!%20SAKURA%20!%20ne%20!%20jp Although the above is for linux-mm and not LKML (it still works), I should ask Konstantin if he will be pulling in any of the other archives. Perhaps have both? (in case marc.info goes away). > Fixes: 6b97a20d3a79 ("printk: set may_schedule for some of console_trylock() callers") Should we Cc stable@vger.kernel.org? > Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> > Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> > --- > kernel/printk/printk.c | 22 ++++++++-------------- > 1 file changed, 8 insertions(+), 14 deletions(-) > > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c > index ffe05024c622..9cb943c90d98 100644 > --- a/kernel/printk/printk.c > +++ b/kernel/printk/printk.c > @@ -1895,6 +1895,12 @@ asmlinkage int vprintk_emit(int facility, int level, > > /* If called from the scheduler, we can not call up(). */ > if (!in_sched) { > + /* > + * Disable preemption to avoid being preempted while holding > + * console_sem which would prevent anyone from printing to > + * console > + */ > + preempt_disable(); > /* > * Try to acquire and then immediately release the console > * semaphore. The release will print out buffers and wake up > @@ -1902,6 +1908,7 @@ asmlinkage int vprintk_emit(int facility, int level, > */ > if (console_trylock_spinning()) > console_unlock(); > + preempt_enable(); > } > > return printed_len; > @@ -2229,20 +2236,7 @@ int console_trylock(void) > return 0; > } > console_locked = 1; > - /* > - * When PREEMPT_COUNT disabled we can't reliably detect if it's > - * safe to schedule (e.g. calling printk while holding a spin_lock), > - * because preempt_disable()/preempt_enable() are just barriers there > - * and preempt_count() is always 0. > - * > - * RCU read sections have a separate preemption counter when > - * PREEMPT_RCU enabled thus we must take extra care and check > - * rcu_preempt_depth(), otherwise RCU read sections modify > - * preempt_count(). > - */ > - console_may_schedule = !oops_in_progress && > - preemptible() && > - !rcu_preempt_depth(); > + console_may_schedule = 0; > return 1; > } > EXPORT_SYMBOL(console_trylock); Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Thanks Sergey! -- Steve ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-16 15:45 ` Steven Rostedt @ 2018-01-17 2:18 ` Sergey Senozhatsky 2018-01-17 13:04 ` Petr Mladek 0 siblings, 1 reply; 140+ messages in thread From: Sergey Senozhatsky @ 2018-01-17 2:18 UTC (permalink / raw) To: Steven Rostedt Cc: Sergey Senozhatsky, Petr Mladek, Tetsuo Handa, Sergey Senozhatsky, Tejun Heo, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, rostedt, Byungchul Park, Pavel Machek, linux-kernel On (01/16/18 10:45), Steven Rostedt wrote: [..] > > [1] https://marc.info/?l=linux-mm&m=145692016122716 > > Especially since Konstantin is working on pulling in all LKML archives, > the above should be denoted as: > > Link: http://lkml.kernel.org/r/201603022101.CAH73907.OVOOMFHFFtQJSL%20()%20I-love%20!%20SAKURA%20!%20ne%20!%20jp hm, may I ask why? is there a new rule now to percent-encode commit messages? > Although the above is for linux-mm and not LKML (it still works), I > should ask Konstantin if he will be pulling in any of the other > archives. Perhaps have both? (in case marc.info goes away). > > > Fixes: 6b97a20d3a79 ("printk: set may_schedule for some of console_trylock() callers") > > Should we Cc stable@vger.kernel.org? that's a good question... maybe yes, maybe no... I'd say this change is "safer" when we have hand-off. > Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> > > Thanks Sergey! thanks. -ss ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-17 2:18 ` Sergey Senozhatsky @ 2018-01-17 13:04 ` Petr Mladek 2018-01-17 15:24 ` Steven Rostedt 2018-01-18 4:31 ` Sergey Senozhatsky 0 siblings, 2 replies; 140+ messages in thread From: Petr Mladek @ 2018-01-17 13:04 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Steven Rostedt, Tetsuo Handa, Sergey Senozhatsky, Tejun Heo, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, rostedt, Byungchul Park, Pavel Machek, linux-kernel On Wed 2018-01-17 11:18:56, Sergey Senozhatsky wrote: > On (01/16/18 10:45), Steven Rostedt wrote: > [..] > > > [1] https://marc.info/?l=linux-mm&m=145692016122716 > > > > Especially since Konstantin is working on pulling in all LKML archives, > > the above should be denoted as: > > > > Link: http://lkml.kernel.org/r/201603022101.CAH73907.OVOOMFHFFtQJSL%20()%20I-love%20!%20SAKURA%20!%20ne%20!%20jp > > hm, may I ask why? is there a new rule now to percent-encode commit messages? IMHO, the most important thing is that Steven's link is based on the Message-ID and the stable redirector https://lkml.kernel.org/. It has a better chance to work even in the future. I have been asked by other people to use this type of links as well. > > Although the above is for linux-mm and not LKML (it still works), I > > should ask Konstantin if he will be pulling in any of the other > > archives. Perhaps have both? (in case marc.info goes away). > > > > > Fixes: 6b97a20d3a79 ("printk: set may_schedule for some of console_trylock() callers") > > > > Should we Cc stable@vger.kernel.org? > > that's a good question... maybe yes, maybe no... I'd say this > change is "safer" when we have hand-off. I would keep it as is in stable kernels unless there are many bug reports. Best Regards, Petr ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-17 13:04 ` Petr Mladek @ 2018-01-17 15:24 ` Steven Rostedt 2018-01-18 4:31 ` Sergey Senozhatsky 1 sibling, 0 replies; 140+ messages in thread From: Steven Rostedt @ 2018-01-17 15:24 UTC (permalink / raw) To: Petr Mladek Cc: Sergey Senozhatsky, Tetsuo Handa, Sergey Senozhatsky, Tejun Heo, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, rostedt, Byungchul Park, Pavel Machek, linux-kernel On Wed, 17 Jan 2018 14:04:07 +0100 Petr Mladek <pmladek@suse.com> wrote: > On Wed 2018-01-17 11:18:56, Sergey Senozhatsky wrote: > > On (01/16/18 10:45), Steven Rostedt wrote: > > [..] > > > > [1] https://marc.info/?l=linux-mm&m=145692016122716 > > > > > > Especially since Konstantin is working on pulling in all LKML archives, > > > the above should be denoted as: > > > > > > Link: http://lkml.kernel.org/r/201603022101.CAH73907.OVOOMFHFFtQJSL%20()%20I-love%20!%20SAKURA%20!%20ne%20!%20jp > > > > hm, may I ask why? is there a new rule now to percent-encode commit messages? > > IMHO, the most important thing is that Steven's link is based > on the Message-ID and the stable redirector > https://lkml.kernel.org/. It has a better chance to work > even in the future. Exactly. There's an effort to avoid any outside link dependencies in the Linux git history. No one expected gmane to end (although it appears to be making a comeback), but we don't want to get stuck if marc.info disappears one day. > > > > > > Should we Cc stable@vger.kernel.org? > > > > that's a good question... maybe yes, maybe no... I'd say this > > change is "safer" when we have hand-off. > > I would keep it as is in stable kernels unless there are > many bug reports. Agreed. -- Steve ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-17 13:04 ` Petr Mladek 2018-01-17 15:24 ` Steven Rostedt @ 2018-01-18 4:31 ` Sergey Senozhatsky 2018-01-18 15:22 ` Steven Rostedt 1 sibling, 1 reply; 140+ messages in thread From: Sergey Senozhatsky @ 2018-01-18 4:31 UTC (permalink / raw) To: Petr Mladek Cc: Sergey Senozhatsky, Steven Rostedt, Tetsuo Handa, Sergey Senozhatsky, Tejun Heo, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, rostedt, Byungchul Park, Pavel Machek, linux-kernel On (01/17/18 14:04), Petr Mladek wrote: > On Wed 2018-01-17 11:18:56, Sergey Senozhatsky wrote: > > On (01/16/18 10:45), Steven Rostedt wrote: > > [..] > > > > [1] https://marc.info/?l=linux-mm&m=145692016122716 > > > > > > Especially since Konstantin is working on pulling in all LKML archives, > > > the above should be denoted as: > > > > > > Link: http://lkml.kernel.org/r/201603022101.CAH73907.OVOOMFHFFtQJSL%20()%20I-love%20!%20SAKURA%20!%20ne%20!%20jp > > > > hm, may I ask why? is there a new rule now to percent-encode commit messages? > > IMHO, the most important thing is that Steven's link is based > on the Message-ID and the stable redirector > https://lkml.kernel.org/. It has a better chance to work > even in the future. d'oh... indeed, I copy-pasted the wrong URL... it should have been lkml.kernel.org/r/ [and it actually was]. -ss ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-18 4:31 ` Sergey Senozhatsky @ 2018-01-18 15:22 ` Steven Rostedt 0 siblings, 0 replies; 140+ messages in thread From: Steven Rostedt @ 2018-01-18 15:22 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Petr Mladek, Tetsuo Handa, Sergey Senozhatsky, Tejun Heo, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, rostedt, Byungchul Park, Pavel Machek, linux-kernel On Thu, 18 Jan 2018 13:31:16 +0900 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> wrote: > d'oh... indeed, I copy-pasted the wrong URL... it should > have been lkml.kernel.org/r/ [and it actually was]. I've learned to do a copy after entering the lkml.kernel.org link into the browser url, and before hitting enter. The redirection kills you. -- Steve ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-16 2:23 ` Sergey Senozhatsky 2018-01-16 4:47 ` Sergey Senozhatsky @ 2018-01-16 10:13 ` Petr Mladek 2018-01-17 6:29 ` Sergey Senozhatsky 1 sibling, 1 reply; 140+ messages in thread From: Petr Mladek @ 2018-01-16 10:13 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Steven Rostedt, Sergey Senozhatsky, Tejun Heo, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Pavel Machek, linux-kernel On Tue 2018-01-16 11:23:49, Sergey Senozhatsky wrote: > On (01/15/18 15:45), Petr Mladek wrote: > > > I think adding the preempt_disable() would fix printk() but let non > > > printk console_unlock() still preempt. > > > > I would personally remove cond_resched() from console_unlock() > > completely. > > hmm, not so sure. I think it's there for !PREEMPT systems which have > to print a lot of messages. the case I'm speaking about in particular > is when we register a CON_PRINTBUFFER console and need to console_unlock() > (flush) all of the messages we currently have in the logbuf. we better > have that cond_resched() there, I think. Good point. I agree that we should keep the cond_resched() there at least for now. > > Sleeping in console_unlock() increases the chance that more messages > > would need to be handled. And more importantly it reduces the chance > > of a successful handover. > > > > As a result, the caller might spend there very long time, it might > > be getting increasingly far behind. There is higher risk of lost > > messages. Also the eventual taker might have too much to proceed > > in preemption disabled context. > > yes. > > > Removing cond_resched() is in sync with printk() priorities. > > hmm, not sure. we have sleeping console_lock()->console_unlock() path > for PREEMPT kernels, that cond_resched() makes the !PREEMPT kernels to > have the same sleeping console_lock()->console_unlock(). > > printk()->console_unlock() seems to be a pretty independent thing, > unfortunately (!), yet sleeping console_lock()->console_unlock() > messes up with it a lot. IMHO, the problem here is that console_lock is used to synchronize too many things. It would be great to separate printk() duties into a separate lock in the long term. Anyway, I see it the following way. Most console_lock() callers do the following things: void foo() { console_lock() foo_specific_work(); console_unlock(); } where console_unlock() flushes the printk buffer before actually releasing the lock. IMHO, it would make sense if flushing the printk buffer behaves the same when called either from printk() or from any other path. I mean that it should be aggressive and allow an effective hand off. It should be safe as long as foo_specific_work() does not take too much time. >From other side. The cond_resched() in console_unlock() should be obsoleted by the hand-shake code. > > The highest one is to get the messages out. > > > > Finally, removing cond_resched() should make the behavior more > > predictable (never preempted) > > but we are always preempted in PREEMPT kernels when the current > console_sem owner acquired the lock via console_lock(), not via > console_trylock(). cond_resched() does the same, but for !PREEMPT. I agree that the situation is more complicated for cond_resched() called after console_lock(). I do not resist on removing it now. Just one more thing. The time axe looks like: + cond_resched added into console_unlock in v4.5-rc1, Jan 15, 2016 (commit 8d91f8b15361dfb438ab6) + preemtion enabled in printk in, v4.6-rc1, Mar 17, 2016 (commit 6b97a20d3a7909daa0662) They both were obvious solutions that helped to reduce the risk of soft-lockups. The first one handled evidently safe scenarios. The second one was even more aggressive. I would say that they both were more or less add-hoc solutions that did not take into account the other side effects (delaying output, even loosing messages). I would not say that one is a diametric difference between them. Therefore if we remove one for a reason, we should think about reverting the other as well. But again. I am fine if we remove only one now. Does this make any sense? Best Regard, Petr ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-16 10:13 ` Petr Mladek @ 2018-01-17 6:29 ` Sergey Senozhatsky 0 siblings, 0 replies; 140+ messages in thread From: Sergey Senozhatsky @ 2018-01-17 6:29 UTC (permalink / raw) To: Petr Mladek Cc: Sergey Senozhatsky, Steven Rostedt, Sergey Senozhatsky, Tejun Heo, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Pavel Machek, linux-kernel On (01/16/18 11:13), Petr Mladek wrote: [..] > IMHO, it would make sense if flushing the printk buffer behaves > the same when called either from printk() or from any other path. > I mean that it should be aggressive and allow an effective > hand off. > > It should be safe as long as foo_specific_work() does not take > too much time. > > From other side. The cond_resched() in console_unlock() should > be obsoleted by the hand-shake code. hm, let's not have too optimistic expectations. hand off works in very specific conditions. console is not exclusively owned by printk, and console_sem is not printk's own lock. even things like systemd -> n_tty_write -> do_output_char -> con_write involves console_lock() and console_unlock(). IOW user space logging/debugging can cause printk stalls, and vice versa. by the way, do_con_write() explicitly calls console_conditional_schedule() under console_sem, before it goes to console_unlock(). so the scope of "situation normal, console_sem locked, the owner scheduled out" is much bigger than just vprintk_emit() -> console_unlock(). IMHO. and there are even more things there. personally, I don't think that hand off is enough to obsolete anything in that area. [...] > They both were obvious solutions that helped to reduce the risk > of soft-lockups. The first one handled evidently safe scenarios. > The second one was even more aggressive. I would say that > they both were more or less add-hoc solutions that did not > take into account the other side effects (delaying output, > even loosing messages). agreed. > I would not say that one is a diametric difference between them. > Therefore if we remove one for a reason, we should think about > reverting the other as well. But again. I am fine if we remove > only one now. > > Does this make any sense? I see cond_resched() as a mirroring of console_lock()->console_unlock() behaviour on PREEMPT systems, and as such it looks valid to me, so we probably better keep it there. IMHO. -ss ^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup 2018-01-15 12:06 ` Steven Rostedt 2018-01-15 14:45 ` Petr Mladek @ 2018-01-16 1:46 ` Sergey Senozhatsky 1 sibling, 0 replies; 140+ messages in thread From: Sergey Senozhatsky @ 2018-01-16 1:46 UTC (permalink / raw) To: Steven Rostedt Cc: Sergey Senozhatsky, Sergey Senozhatsky, Petr Mladek, Tejun Heo, akpm, linux-mm, Cong Wang, Dave Hansen, Johannes Weiner, Mel Gorman, Michal Hocko, Vlastimil Babka, Peter Zijlstra, Linus Torvalds, Jan Kara, Mathieu Desnoyers, Tetsuo Handa, rostedt, Byungchul Park, Pavel Machek, linux-kernel On (01/15/18 07:06), Steven Rostedt wrote: > > > Yep, but I'm still not convinced you are seeing an issue with a single > > > printk. > > > > what do you mean by this? > > I'm not sure your issues happen because a single printk is locked up, > but you have many printks in one area. hm, need to think about it. > > > An OOM does not do everything in one printk, it calls hundreds. > > > Having hundreds of printks is an issue, especially in critical sections. > > > > unless your console_sem owner is preempted. as long as it is preempted > > it doesn't really matter how many times we call printk from which CPUs > > and from which sections, but what matters - who is going to print that all > > out when console_sem is running again and how much time will it take. > > that's what I'm saying. > > OK, if this is an issue, then we could do: > > preempt_disable(); > if (console_trylock_spinning()) > console_unlock(); > preempt_enable(); > > Which would prevent any printks from being preempted, but allow for > other console_lock owners to be so. yes, non-preemptible printk->console_unlock() is good for a number of reasons. [..] > > > > vprintk_emit() > > > > { > > > > > > > > console_trylock_spinning(void) > > > > { > > > > printk_safe_enter_irqsave(flags); > > > > while (READ_ONCE(console_waiter)) // spins as long as call_console_drivers() on other CPU > > > > cpu_relax(); > > > > printk_safe_exit_irqrestore(flags); > > > > ---> } > > > > | // preemptible up until printk_safe_enter_irqsave() in console_unlock() > > > > > > Again, this means the waiter is not in a critical section. Why do we > > > care? > > > > which is not what I was talking about. the point was that you said > > And would be fixed with the preempt_disable() I added above. yes. and it's, basically, very close to a revert of the commit I mentioned. [..] > > that is not true. we can have preemption "during" hand off. hand off, > > thus, is a "delayed approach", by definition. so if you consider the > > possibility of "if the machine were to crash in the transfer, we lost > > all that data" and if you consider this to be important [otherwise you > > wouldn't bring that up, would you] then the reality is that your patch > > has the same problem as printk_kthread. > > With the preempt_disable() there really isn't a delay. I agree, we > shouldn't let printk preempt (unless we have CONFIG_PREEMPT_RT enabled, > but that's another story). yes. > > so very schematically, for hand-off it's something like > > > > if (... console_trylock_spinning()) // grabbed the ownership > > > > << ... preempted ... >> > > > > console_unlock(); > > Which I think we should stop, with the preempt_disable(). yes. > > for printk_kthread it's something like > > > > wake_up_process(printk_kthread); > > up(console_sem); > > > > > > in the later case we at least have console_sem unlocked. so any other CPU > > that might do printk() can grab the lock and emit the logbuf messages. but > > in case on hand-off, we have console_sem locked, so no printk() will be > > able to emit the messages, we need that specific task to become running. > > > > > > hence the following: > > > > [..] > > > > reverting 6b97a20d3a7909daa06625d4440c2c52d7bf08d7 may be the right > > > > thing after all. > > > > this was cryptic and misleading. sorry. > > some clarifications. > > > > what I meant was that with 6b97a20d3a7909daa06625d4440c2c52d7bf08d7 > > I think I badly broke printk() [some of paths]. I know what I tried > > I think adding the preempt_disable() would fix printk() but let non > printk console_unlock() still preempt. yes. might be a bit risky, but can try. and yes, we still have console_lock() call sites, which can sleep under console_sem, so scheduler still can mess up with us, but that's a different story. agreed. > > to fix (and you don't have to explain to me what a lock up is) with > > that patch, but I don't think the patch ended up to be a clear win. > > a very simple explanation would be: > > > > instead of having a direct nonpreemptible path > > > > logbuf -> for(;;) call_console_drivers -> happy user > > > > we now have > > > > logbuf -> for(;;) { call_console_drivers, scheduler ... ???} -> happy user > > > > which is a big change. with a non-zero potential for regressions. > > and it didn't take long to find out that not all "happy users" were > > exactly happy with the new scheme of things. glance through Tetsuo's > > emails [see links in my another email], Tetsuo reported that printk can > > stall for minutes now. basically, the worse the system state is the lower > > printk throughput can be [down to zero chars in the worst case]. that's > > why I think that my patch was a mistake. and that's why in my out-of-tree > > patches I'm moving towards the non-preemptible path from logbuf through > > console to a happy user [just like it used to be]. but, obviously, I can't > > just restore preempt_disable()/preempt_enable() in vprintk_emit(). that's > > why I bound console_unlock() to watchdog threshold and move towards the > > batched non-preemptible print outs (enabling preemption and up()-ing the > > console_sem at the end of each print out batch). this is not super good, > > preemption is still here, but at least not after every line console_unlock() > > prints. up() console_sem also increases chances that, for instance, systemd > > or any other task that is sleeping in TASK_UNINTERRUPTIBLE on console_sem > > now has a chance to be woken up sooner (not only after we flush all pending > > logbuf messages and finally up() the console_sem). > > I rather try simpler approaches first (like adding the preempt_disable() > on top of my patch) than an elaborate scheme of printk_kthreads. ok, agreed. -ss ^ permalink raw reply [flat|nested] 140+ messages in thread
end of thread, other threads:[~2018-05-09 8:58 UTC | newest] Thread overview: 140+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-01-10 13:24 [PATCH v5 0/2] printk: Console owner and waiter logic cleanup Petr Mladek 2018-01-10 13:24 ` [PATCH v5 1/2] printk: Add console owner and waiter logic to load balance console writes Petr Mladek 2018-01-10 16:50 ` Steven Rostedt 2018-01-12 16:54 ` Steven Rostedt 2018-01-12 17:11 ` Steven Rostedt 2018-01-17 19:13 ` Rasmus Villemoes 2018-01-17 19:33 ` Steven Rostedt 2018-01-19 9:51 ` Sergey Senozhatsky 2018-01-18 22:03 ` Pavel Machek 2018-01-19 0:20 ` Steven Rostedt 2018-01-17 2:19 ` Byungchul Park 2018-01-17 4:54 ` Byungchul Park 2018-01-17 7:34 ` Byungchul Park 2018-01-17 12:04 ` Petr Mladek 2018-01-18 1:53 ` Byungchul Park 2018-01-18 1:57 ` Byungchul Park 2018-01-18 2:19 ` Steven Rostedt 2018-01-18 4:01 ` Byungchul Park 2018-01-18 15:21 ` Steven Rostedt 2018-01-19 2:37 ` Byungchul Park 2018-01-19 3:27 ` Steven Rostedt 2018-01-22 2:31 ` Byungchul Park 2018-01-10 13:24 ` [PATCH v5 2/2] printk: Hide console waiter logic into helpers Petr Mladek 2018-01-10 17:52 ` Steven Rostedt 2018-01-11 12:03 ` Petr Mladek 2018-01-12 15:37 ` Steven Rostedt 2018-01-12 16:08 ` Petr Mladek 2018-01-12 16:36 ` Steven Rostedt 2018-01-15 16:08 ` Petr Mladek 2018-01-16 5:05 ` Sergey Senozhatsky 2018-01-10 14:05 ` [PATCH v5 0/2] printk: Console owner and waiter logic cleanup Tejun Heo 2018-01-10 16:29 ` Petr Mladek 2018-01-10 17:02 ` Tejun Heo 2018-01-10 18:21 ` Peter Zijlstra 2018-01-10 18:30 ` Tejun Heo 2018-01-10 18:41 ` Peter Zijlstra 2018-01-10 19:05 ` Tejun Heo 2018-01-11 5:15 ` Sergey Senozhatsky 2018-01-10 18:22 ` Steven Rostedt 2018-01-10 18:36 ` Tejun Heo 2018-01-10 18:40 ` Mathieu Desnoyers 2018-01-11 7:36 ` Sergey Senozhatsky 2018-01-11 11:24 ` Petr Mladek 2018-01-11 13:19 ` Sergey Senozhatsky 2018-01-24 9:36 ` Peter Zijlstra 2018-01-24 18:46 ` Tejun Heo 2018-05-09 8:58 ` Sergey Senozhatsky 2018-01-10 18:54 ` Steven Rostedt 2018-01-11 5:10 ` Sergey Senozhatsky 2018-01-10 18:05 ` Steven Rostedt 2018-01-10 18:12 ` Tejun Heo 2018-01-10 18:14 ` Tejun Heo 2018-01-10 18:45 ` Steven Rostedt 2018-01-10 18:41 ` Steven Rostedt 2018-01-10 18:57 ` Tejun Heo 2018-01-10 19:17 ` Steven Rostedt 2018-01-10 19:34 ` Tejun Heo 2018-01-10 19:44 ` Steven Rostedt 2018-01-10 22:44 ` Tejun Heo 2018-01-11 5:35 ` Sergey Senozhatsky 2018-01-11 4:58 ` Sergey Senozhatsky 2018-01-11 9:34 ` Petr Mladek 2018-01-11 10:38 ` Sergey Senozhatsky 2018-01-11 11:50 ` Petr Mladek 2018-01-11 16:29 ` Steven Rostedt 2018-01-12 1:30 ` Steven Rostedt 2018-01-12 2:55 ` Steven Rostedt 2018-01-12 4:20 ` Steven Rostedt 2018-01-16 19:44 ` Tejun Heo 2018-01-17 9:12 ` Petr Mladek 2018-01-17 15:15 ` Tejun Heo 2018-01-17 17:12 ` Steven Rostedt 2018-01-17 18:42 ` Steven Rostedt 2018-01-19 18:20 ` Steven Rostedt 2018-01-20 7:14 ` Sergey Senozhatsky 2018-01-20 15:49 ` Steven Rostedt 2018-01-21 14:15 ` Sergey Senozhatsky 2018-01-21 21:04 ` Steven Rostedt 2018-01-22 8:56 ` Sergey Senozhatsky 2018-01-22 10:28 ` Sergey Senozhatsky 2018-01-22 10:36 ` Sergey Senozhatsky 2018-01-23 6:40 ` Sergey Senozhatsky 2018-01-23 7:05 ` Sergey Senozhatsky 2018-01-23 7:31 ` Sergey Senozhatsky 2018-01-23 14:56 ` Steven Rostedt 2018-01-23 15:21 ` Sergey Senozhatsky 2018-01-23 15:41 ` Steven Rostedt 2018-01-23 15:43 ` Tejun Heo 2018-01-23 16:12 ` Sergey Senozhatsky 2018-01-23 16:13 ` Steven Rostedt 2018-01-23 17:21 ` Tejun Heo 2018-04-23 5:35 ` Sergey Senozhatsky 2018-01-23 16:01 ` Sergey Senozhatsky 2018-01-23 16:24 ` Steven Rostedt 2018-01-24 2:11 ` Sergey Senozhatsky 2018-01-24 2:52 ` Steven Rostedt 2018-01-24 4:44 ` Sergey Senozhatsky 2018-01-23 17:22 ` Tejun Heo 2018-01-20 12:19 ` Tejun Heo 2018-01-20 14:51 ` Steven Rostedt 2018-01-17 20:05 ` Tejun Heo 2018-01-18 5:43 ` Sergey Senozhatsky 2018-01-18 11:51 ` Petr Mladek 2018-01-18 5:42 ` Sergey Senozhatsky 2018-01-12 3:12 ` Sergey Senozhatsky 2018-01-12 2:56 ` Sergey Senozhatsky 2018-01-12 3:21 ` Steven Rostedt 2018-01-12 10:05 ` Sergey Senozhatsky 2018-01-12 12:21 ` Steven Rostedt 2018-01-12 12:55 ` Petr Mladek 2018-01-13 7:31 ` Sergey Senozhatsky 2018-01-15 8:51 ` Petr Mladek 2018-01-15 9:48 ` Sergey Senozhatsky 2018-01-16 5:16 ` Sergey Senozhatsky 2018-01-16 9:08 ` Petr Mladek 2018-01-15 12:08 ` Steven Rostedt 2018-01-16 4:51 ` Sergey Senozhatsky 2018-01-13 7:28 ` Sergey Senozhatsky 2018-01-15 10:17 ` Petr Mladek 2018-01-15 11:50 ` Petr Mladek 2018-01-16 6:10 ` Sergey Senozhatsky 2018-01-16 9:36 ` Petr Mladek 2018-01-16 10:10 ` Sergey Senozhatsky 2018-01-16 16:06 ` Steven Rostedt 2018-01-16 5:23 ` Sergey Senozhatsky 2018-01-15 12:06 ` Steven Rostedt 2018-01-15 14:45 ` Petr Mladek 2018-01-16 2:23 ` Sergey Senozhatsky 2018-01-16 4:47 ` Sergey Senozhatsky 2018-01-16 10:19 ` Petr Mladek 2018-01-17 2:24 ` Sergey Senozhatsky 2018-01-16 15:45 ` Steven Rostedt 2018-01-17 2:18 ` Sergey Senozhatsky 2018-01-17 13:04 ` Petr Mladek 2018-01-17 15:24 ` Steven Rostedt 2018-01-18 4:31 ` Sergey Senozhatsky 2018-01-18 15:22 ` Steven Rostedt 2018-01-16 10:13 ` Petr Mladek 2018-01-17 6:29 ` Sergey Senozhatsky 2018-01-16 1:46 ` Sergey Senozhatsky
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).