* Regression in next with use printk_safe buffers in printk
@ 2017-02-13 18:59 Tony Lindgren
2017-02-14 16:01 ` Sergey Senozhatsky
0 siblings, 1 reply; 16+ messages in thread
From: Tony Lindgren @ 2017-02-13 18:59 UTC (permalink / raw)
To: Sergey Senozhatsky, Petr Mladek, Steven Rostedt
Cc: Peter Zijlstra, Thomas Gleixner, linux-kernel
Hi all,
Looks like commit f975237b7682 ("printk: use printk_safe buffers in
printk") causes "possible circular locking dependency detected " for
me on the first suspend.
Reverting the following four patches in next makes it go away:
d9c23523ed98 ("printk: drop call_console_drivers() unused param")
de6fcbdb68b2 ("printk: convert the rest to printk-safe")
8b1742c9c207 ("printk: remove zap_locks() function")
f975237b7682 ("printk: use printk_safe buffers in printk")
See below for the warning, #1 seems to be the relevant one as pointed
out by Peter.
Regards,
Tony
8< ---------------------
[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.
[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:
[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock
[ 48.951721] Possible unsafe locking scenario:
[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***
[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843]
stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+ #101
[ 48.951873] Hardware name: Generic OMAP36xx (Flattened Device Tree)
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)
[ 48.952911] PM: noirq resume of devices complete after 33.355 msecs
...
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Regression in next with use printk_safe buffers in printk
2017-02-13 18:59 Regression in next with use printk_safe buffers in printk Tony Lindgren
@ 2017-02-14 16:01 ` Sergey Senozhatsky
2017-02-14 16:18 ` Peter Zijlstra
` (2 more replies)
0 siblings, 3 replies; 16+ messages in thread
From: Sergey Senozhatsky @ 2017-02-14 16:01 UTC (permalink / raw)
To: Tony Lindgren
Cc: Sergey Senozhatsky, Petr Mladek, Steven Rostedt, Peter Zijlstra,
Thomas Gleixner, linux-kernel, Sergey Senozhatsky,
Rafael J. Wysocki, linux-pm
Hello,
Cc Rafael, just in case
On (02/13/17 10:59), Tony Lindgren wrote:
> Looks like commit f975237b7682 ("printk: use printk_safe buffers in
> printk") causes "possible circular locking dependency detected " for
> me on the first suspend.
thanks for the report.
> Reverting the following four patches in next makes it go away:
>
> d9c23523ed98 ("printk: drop call_console_drivers() unused param")
> de6fcbdb68b2 ("printk: convert the rest to printk-safe")
> 8b1742c9c207 ("printk: remove zap_locks() function")
> f975237b7682 ("printk: use printk_safe buffers in printk")
these patches basically just enable locked where it previously was
forcibly turned off. no timekeeping/pm/sched/etc code was modified.
can you share the link where Peter pointed out that this might be
caused by printk() changes?
so the report is
> kworker/0:0/3 is trying to acquire lock:
> (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
>
> but task is already holding lock:
>
> (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
>
> which lock already depends on the new lock.
some thoughts.
so there is a:
BAR -> try_to_wake_up()-> lock_hrtimer_base() ##hrtimer_bases.lock
chain (from #5).
and there is a:
FOO -> printk()->try_to_wake_up()
chain (from #2)
note, that printk() patches that you have mentioned don't add anything
new here. the call chain existed before, we just keep lockdep enabled now.
what #1 does is
timekeeping_resume()
lock timekeeper_lock
lock tk_core
tk_debug_account_sleep_time()
printk() << lockdep was disabled here before
try_to_wake_up()
lock_hrtimer_base() ##hrtimer_bases.lock
so it's tk_core -> hrtimer_bases.lock
and retrigger_next_event() does
retrigger_next_event()
lock hrtimer_bases
ktime_get_update_offsets_now()
lock tk_core
which is hrtimer_bases.lock -> tk_core
but I'm a bit confused by rt_b->rt_runtime_lock in this unsafe lock
scenario (so it's not ABBA, but ABAD)
> lock(hrtimer_bases.lock);
> lock(&rt_b->rt_runtime_lock);
> lock(hrtimer_bases.lock);
> lock(tk_core);
>
>
> Chain exists of:
>
> tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock
I'm lacking some knowledge here, sorry. where does the tk_core --> &rt_b->rt_runtime_lock
come from?
By the way, as a side note,
shouldn't tk_debug_account_sleep_time() do printk_deferred() instead of
'normal' printk()?
printk() calls from under timekeeping seqlock are not safe, aren't they?
and tk_debug_account_sleep_time() is under tk_core seq lock.
IOW, replace pr_info() in tk_debug_account_sleep_time() with something
like this
printk_deferred(KERN_INFO "Suspended for %lld.%03lu seconds\n",
(s64)t->tv_sec, t->tv_nsec / NSEC_PER_MSEC);
---
diff --git a/kernel/time/timekeeping_debug.c b/kernel/time/timekeeping_debug.c
index ca9fb800336b..b8f7146c3538 100644
--- a/kernel/time/timekeeping_debug.c
+++ b/kernel/time/timekeeping_debug.c
@@ -75,7 +75,8 @@ void tk_debug_account_sleep_time(struct timespec64 *t)
int bin = min(fls(t->tv_sec), NUM_BINS-1);
sleep_time_bin[bin]++;
- pr_info("Suspended for %lld.%03lu seconds\n", (s64)t->tv_sec,
+ printk_deferred(KERN_INFO "Suspended for %lld.%03lu seconds\n",
+ (s64)t->tv_sec,
t->tv_nsec / NSEC_PER_MSEC);
}
---
I'll keep looking at the report more in the meantime.
-ss
> 8< ---------------------
> [ 48.950592] ======================================================
> [ 48.950622] [ INFO: possible circular locking dependency detected ]
> [ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
> [ 48.950622] -------------------------------------------------------
> [ 48.950622] kworker/0:0/3 is trying to acquire lock:
> [ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
> [ 48.950683]
> but task is already holding lock:
> [ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
> [ 48.950714]
> which lock already depends on the new lock.
>
> [ 48.950714]
> the existing dependency chain (in reverse order) is:
> [ 48.950714]
> -> #5 (hrtimer_bases.lock){-.-...}:
> [ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
> [ 48.950775] lock_hrtimer_base+0x28/0x58
> [ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
> [ 48.950775] __enqueue_rt_entity+0x320/0x360
> [ 48.950805] enqueue_rt_entity+0x2c/0x44
> [ 48.950805] enqueue_task_rt+0x24/0x94
> [ 48.950836] ttwu_do_activate+0x54/0xc0
> [ 48.950836] try_to_wake_up+0x248/0x5c8
> [ 48.950836] __setup_irq+0x420/0x5f0
> [ 48.950836] request_threaded_irq+0xdc/0x184
> [ 48.950866] devm_request_threaded_irq+0x58/0xa4
> [ 48.950866] omap_i2c_probe+0x530/0x6a0
> [ 48.950897] platform_drv_probe+0x50/0xb0
> [ 48.950897] driver_probe_device+0x1f8/0x2cc
> [ 48.950897] __driver_attach+0xc0/0xc4
> [ 48.950927] bus_for_each_dev+0x6c/0xa0
> [ 48.950927] bus_add_driver+0x100/0x210
> [ 48.950927] driver_register+0x78/0xf4
> [ 48.950958] do_one_initcall+0x3c/0x16c
> [ 48.950958] kernel_init_freeable+0x20c/0x2d8
> [ 48.950958] kernel_init+0x8/0x110
> [ 48.950988] ret_from_fork+0x14/0x24
> [ 48.950988]
> -> #4 (&rt_b->rt_runtime_lock){-.-...}:
> [ 48.951019] _raw_spin_lock+0x40/0x50
> [ 48.951019] rq_offline_rt+0x9c/0x2bc
> [ 48.951019] set_rq_offline.part.2+0x2c/0x58
> [ 48.951049] rq_attach_root+0x134/0x144
> [ 48.951049] cpu_attach_domain+0x18c/0x6f4
> [ 48.951049] build_sched_domains+0xba4/0xd80
> [ 48.951080] sched_init_smp+0x68/0x10c
> [ 48.951080] kernel_init_freeable+0x160/0x2d8
> [ 48.951080] kernel_init+0x8/0x110
> [ 48.951080] ret_from_fork+0x14/0x24
> [ 48.951110]
> -> #3 (&rq->lock){-.-.-.}:
> [ 48.951110] _raw_spin_lock+0x40/0x50
> [ 48.951141] task_fork_fair+0x30/0x124
> [ 48.951141] sched_fork+0x194/0x2e0
> [ 48.951141] copy_process.part.5+0x448/0x1a20
> [ 48.951171] _do_fork+0x98/0x7e8
> [ 48.951171] kernel_thread+0x2c/0x34
> [ 48.951171] rest_init+0x1c/0x18c
> [ 48.951202] start_kernel+0x35c/0x3d4
> [ 48.951202] 0x8000807c
> [ 48.951202]
> -> #2 (&p->pi_lock){-.-.-.}:
> [ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
> [ 48.951232] try_to_wake_up+0x30/0x5c8
> [ 48.951232] up+0x4c/0x60
> [ 48.951263] __up_console_sem+0x2c/0x58
> [ 48.951263] console_unlock+0x3b4/0x650
> [ 48.951263] vprintk_emit+0x270/0x474
> [ 48.951293] vprintk_default+0x20/0x28
> [ 48.951293] printk+0x20/0x30
> [ 48.951324] kauditd_hold_skb+0x94/0xb8
> [ 48.951324] kauditd_thread+0x1a4/0x56c
> [ 48.951324] kthread+0x104/0x148
> [ 48.951354] ret_from_fork+0x14/0x24
> [ 48.951354]
> -> #1 ((console_sem).lock){-.....}:
> [ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
> [ 48.951385] down_trylock+0xc/0x2c
> [ 48.951385] __down_trylock_console_sem+0x24/0x80
> [ 48.951385] console_trylock+0x10/0x8c
> [ 48.951416] vprintk_emit+0x264/0x474
> [ 48.951416] vprintk_default+0x20/0x28
> [ 48.951416] printk+0x20/0x30
> [ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
> [ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
> [ 48.951446] timekeeping_resume+0x218/0x23c
> [ 48.951477] syscore_resume+0x94/0x42c
> [ 48.951477] suspend_enter+0x554/0x9b4
> [ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
> [ 48.951507] enter_state+0x934/0xbd4
> [ 48.951507] pm_suspend+0x14/0x70
> [ 48.951507] state_store+0x68/0xc8
> [ 48.951538] kernfs_fop_write+0xf4/0x1f8
> [ 48.951538] __vfs_write+0x1c/0x114
> [ 48.951538] vfs_write+0xa0/0x168
> [ 48.951568] SyS_write+0x3c/0x90
> [ 48.951568] __sys_trace_return+0x0/0x10
> [ 48.951568]
> -> #0 (tk_core){----..}:
> [ 48.951599] lock_acquire+0xe0/0x294
> [ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
> [ 48.951629] retrigger_next_event+0x4c/0x90
> [ 48.951629] on_each_cpu+0x40/0x7c
> [ 48.951629] clock_was_set_work+0x14/0x20
> [ 48.951660] process_one_work+0x2b4/0x808
> [ 48.951660] worker_thread+0x3c/0x550
> [ 48.951660] kthread+0x104/0x148
> [ 48.951690] ret_from_fork+0x14/0x24
> [ 48.951690]
> other info that might help us debug this:
>
> [ 48.951690] Chain exists of:
> tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock
>
> [ 48.951721] Possible unsafe locking scenario:
>
> [ 48.951721] CPU0 CPU1
> [ 48.951721] ---- ----
> [ 48.951721] lock(hrtimer_bases.lock);
> [ 48.951751] lock(&rt_b->rt_runtime_lock);
> [ 48.951751] lock(hrtimer_bases.lock);
> [ 48.951751] lock(tk_core);
> [ 48.951782]
> *** DEADLOCK ***
>
> [ 48.951782] 3 locks held by kworker/0:0/3:
> [ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
> [ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
> [ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
> [ 48.951843]
> stack backtrace:
> [ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+ #101
> [ 48.951873] Hardware name: Generic OMAP36xx (Flattened Device Tree)
> [ 48.951904] Workqueue: events clock_was_set_work
> [ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
> [ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
> [ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
> [ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
> [ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
> [ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
> [ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
> [ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
> [ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
> [ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
> [ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
> [ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
> [ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
> [ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)
> [ 48.952911] PM: noirq resume of devices complete after 33.355 msecs
> ...
>
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: Regression in next with use printk_safe buffers in printk
2017-02-14 16:01 ` Sergey Senozhatsky
@ 2017-02-14 16:18 ` Peter Zijlstra
2017-02-14 16:56 ` Sergey Senozhatsky
2017-02-14 16:54 ` Tony Lindgren
2017-02-15 18:01 ` Tony Lindgren
2 siblings, 1 reply; 16+ messages in thread
From: Peter Zijlstra @ 2017-02-14 16:18 UTC (permalink / raw)
To: Sergey Senozhatsky
Cc: Tony Lindgren, Petr Mladek, Steven Rostedt, Thomas Gleixner,
linux-kernel, Sergey Senozhatsky, Rafael J. Wysocki, linux-pm
On Wed, Feb 15, 2017 at 01:01:40AM +0900, Sergey Senozhatsky wrote:
>
> but I'm a bit confused by rt_b->rt_runtime_lock in this unsafe lock
> scenario (so it's not ABBA, but ABAD)
>
> > lock(hrtimer_bases.lock);
> > lock(&rt_b->rt_runtime_lock);
> > lock(hrtimer_bases.lock);
> > lock(tk_core);
> >
> >
> > Chain exists of:
> >
> > tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock
>
>
> I'm lacking some knowledge here, sorry. where does the tk_core --> &rt_b->rt_runtime_lock
> come from?
rt_b->rt_runtime_lock is one of the scheduler locks, since we do
printk() under tk_core, which does semaphore muck, which then includes
the entire scheduler chain of locks.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Regression in next with use printk_safe buffers in printk
2017-02-14 16:01 ` Sergey Senozhatsky
2017-02-14 16:18 ` Peter Zijlstra
@ 2017-02-14 16:54 ` Tony Lindgren
2017-02-15 18:01 ` Tony Lindgren
2 siblings, 0 replies; 16+ messages in thread
From: Tony Lindgren @ 2017-02-14 16:54 UTC (permalink / raw)
To: Sergey Senozhatsky
Cc: Petr Mladek, Steven Rostedt, Peter Zijlstra, Thomas Gleixner,
linux-kernel, Sergey Senozhatsky, Rafael J. Wysocki, linux-pm
* Sergey Senozhatsky <sergey.senozhatsky@gmail.com> [170214 08:03]:
> Hello,
>
> Cc Rafael, just in case
>
> On (02/13/17 10:59), Tony Lindgren wrote:
> > Looks like commit f975237b7682 ("printk: use printk_safe buffers in
> > printk") causes "possible circular locking dependency detected " for
> > me on the first suspend.
>
> thanks for the report.
>
> > Reverting the following four patches in next makes it go away:
> >
> > d9c23523ed98 ("printk: drop call_console_drivers() unused param")
> > de6fcbdb68b2 ("printk: convert the rest to printk-safe")
> > 8b1742c9c207 ("printk: remove zap_locks() function")
> > f975237b7682 ("printk: use printk_safe buffers in printk")
>
>
> these patches basically just enable locked where it previously was
> forcibly turned off. no timekeeping/pm/sched/etc code was modified.
> can you share the link where Peter pointed out that this might be
> caused by printk() changes?
Oh sorry I should have been more specific. Not much there to share,
I got redirected over to Peter's department in a private email thread
while chasing this issue. So that was just Peter's comment looking at
the log output.
> timekeeping_resume()
> lock timekeeper_lock
> lock tk_core
> tk_debug_account_sleep_time()
> printk() << lockdep was disabled here before
> try_to_wake_up()
> lock_hrtimer_base() ##hrtimer_bases.lock
Yeah above seems describe what changed, so your patch makes sense.
> shouldn't tk_debug_account_sleep_time() do printk_deferred() instead of
> 'normal' printk()?
> printk() calls from under timekeeping seqlock are not safe, aren't they?
> and tk_debug_account_sleep_time() is under tk_core seq lock.
>
> IOW, replace pr_info() in tk_debug_account_sleep_time() with something
> like this
>
> printk_deferred(KERN_INFO "Suspended for %lld.%03lu seconds\n",
> (s64)t->tv_sec, t->tv_nsec / NSEC_PER_MSEC);
Your patch below fixes the issue for me thanks. I had to apply it manually
though as tabs got replaced by spaces probably by some mail daemons.
Regards,
Tony
> ---
>
> diff --git a/kernel/time/timekeeping_debug.c b/kernel/time/timekeeping_debug.c
> index ca9fb800336b..b8f7146c3538 100644
> --- a/kernel/time/timekeeping_debug.c
> +++ b/kernel/time/timekeeping_debug.c
> @@ -75,7 +75,8 @@ void tk_debug_account_sleep_time(struct timespec64 *t)
> int bin = min(fls(t->tv_sec), NUM_BINS-1);
>
> sleep_time_bin[bin]++;
> - pr_info("Suspended for %lld.%03lu seconds\n", (s64)t->tv_sec,
> + printk_deferred(KERN_INFO "Suspended for %lld.%03lu seconds\n",
> + (s64)t->tv_sec,
> t->tv_nsec / NSEC_PER_MSEC);
> }
>
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Regression in next with use printk_safe buffers in printk
2017-02-14 16:18 ` Peter Zijlstra
@ 2017-02-14 16:56 ` Sergey Senozhatsky
2017-02-14 17:03 ` Tony Lindgren
2017-02-14 18:29 ` Peter Zijlstra
0 siblings, 2 replies; 16+ messages in thread
From: Sergey Senozhatsky @ 2017-02-14 16:56 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Sergey Senozhatsky, Tony Lindgren, Petr Mladek, Steven Rostedt,
Thomas Gleixner, linux-kernel, Sergey Senozhatsky,
Rafael J. Wysocki, linux-pm, John Stultz
On (02/14/17 17:18), Peter Zijlstra wrote:
> On Wed, Feb 15, 2017 at 01:01:40AM +0900, Sergey Senozhatsky wrote:
> >
> > but I'm a bit confused by rt_b->rt_runtime_lock in this unsafe lock
> > scenario (so it's not ABBA, but ABAD)
> >
> > > lock(hrtimer_bases.lock);
> > > lock(&rt_b->rt_runtime_lock);
> > > lock(hrtimer_bases.lock);
> > > lock(tk_core);
> > >
> > >
> > > Chain exists of:
> > >
> > > tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock
> >
> >
> > I'm lacking some knowledge here, sorry. where does the tk_core --> &rt_b->rt_runtime_lock
> > come from?
>
> rt_b->rt_runtime_lock is one of the scheduler locks, since we do
> printk() under tk_core, which does semaphore muck, which then includes
> the entire scheduler chain of locks.
thanks, Peter.
that crossed my mind, but I kinda assumed that we do printk() from
under tk_core using sched fair, and rt_runtime_lock is from sched rt.
so something like below, perhaps. would be helpful if Tony can test it.
(I'll send out this patch 'in a proper way' tomorrow, after some sleep,
it's 2am here).
8< ====
>From e1755b0bf7f8a0be5fdf4dd7303bf4cd150d9d20 Mon Sep 17 00:00:00 2001
From: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Date: Wed, 15 Feb 2017 01:42:18 +0900
Subject: [PATCH] time/timekeeping_debug: use printk_deferred()
Do not call printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
It's not safe to call printk() under tk_core, because console_sem
invokes scheduled (via wake_up_process()->activate_task()), which,
in turn, can call timekeeping code again, for instance, via
get_time()->ktime_get(). This may result in infinite loop on
tk_core.
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
kernel/time/timekeeping_debug.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/kernel/time/timekeeping_debug.c b/kernel/time/timekeeping_debug.c
index ca9fb800336b..b8f7146c3538 100644
--- a/kernel/time/timekeeping_debug.c
+++ b/kernel/time/timekeeping_debug.c
@@ -75,7 +75,8 @@ void tk_debug_account_sleep_time(struct timespec64 *t)
int bin = min(fls(t->tv_sec), NUM_BINS-1);
sleep_time_bin[bin]++;
- pr_info("Suspended for %lld.%03lu seconds\n", (s64)t->tv_sec,
+ printk_deferred(KERN_INFO "Suspended for %lld.%03lu seconds\n",
+ (s64)t->tv_sec,
t->tv_nsec / NSEC_PER_MSEC);
}
--
2.11.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: Regression in next with use printk_safe buffers in printk
2017-02-14 16:56 ` Sergey Senozhatsky
@ 2017-02-14 17:03 ` Tony Lindgren
2017-02-15 4:44 ` Sergey Senozhatsky
2017-02-14 18:29 ` Peter Zijlstra
1 sibling, 1 reply; 16+ messages in thread
From: Tony Lindgren @ 2017-02-14 17:03 UTC (permalink / raw)
To: Sergey Senozhatsky
Cc: Peter Zijlstra, Petr Mladek, Steven Rostedt, Thomas Gleixner,
linux-kernel, Sergey Senozhatsky, Rafael J. Wysocki, linux-pm,
John Stultz
* Sergey Senozhatsky <sergey.senozhatsky@gmail.com> [170214 08:58]:
> On (02/14/17 17:18), Peter Zijlstra wrote:
> > On Wed, Feb 15, 2017 at 01:01:40AM +0900, Sergey Senozhatsky wrote:
> > >
> > > but I'm a bit confused by rt_b->rt_runtime_lock in this unsafe lock
> > > scenario (so it's not ABBA, but ABAD)
> > >
> > > > lock(hrtimer_bases.lock);
> > > > lock(&rt_b->rt_runtime_lock);
> > > > lock(hrtimer_bases.lock);
> > > > lock(tk_core);
> > > >
> > > >
> > > > Chain exists of:
> > > >
> > > > tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock
> > >
> > >
> > > I'm lacking some knowledge here, sorry. where does the tk_core --> &rt_b->rt_runtime_lock
> > > come from?
> >
> > rt_b->rt_runtime_lock is one of the scheduler locks, since we do
> > printk() under tk_core, which does semaphore muck, which then includes
> > the entire scheduler chain of locks.
>
> thanks, Peter.
>
> that crossed my mind, but I kinda assumed that we do printk() from
> under tk_core using sched fair, and rt_runtime_lock is from sched rt.
>
>
> so something like below, perhaps. would be helpful if Tony can test it.
>
> (I'll send out this patch 'in a proper way' tomorrow, after some sleep,
> it's 2am here).
>
> 8< ====
>
> From e1755b0bf7f8a0be5fdf4dd7303bf4cd150d9d20 Mon Sep 17 00:00:00 2001
> From: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> Date: Wed, 15 Feb 2017 01:42:18 +0900
> Subject: [PATCH] time/timekeeping_debug: use printk_deferred()
>
> Do not call printk() from tk_debug_account_sleep_time(), because
> tk_debug_account_sleep_time() is called under tk_core seq lock.
> It's not safe to call printk() under tk_core, because console_sem
> invokes scheduled (via wake_up_process()->activate_task()), which,
> in turn, can call timekeeping code again, for instance, via
> get_time()->ktime_get(). This may result in infinite loop on
> tk_core.
>
> Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Thanks yeah this fixes the issue for me:
Tested-by: Tony Lindgren <tony@atomide.com>
> ---
> kernel/time/timekeeping_debug.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/time/timekeeping_debug.c b/kernel/time/timekeeping_debug.c
> index ca9fb800336b..b8f7146c3538 100644
> --- a/kernel/time/timekeeping_debug.c
> +++ b/kernel/time/timekeeping_debug.c
> @@ -75,7 +75,8 @@ void tk_debug_account_sleep_time(struct timespec64 *t)
> int bin = min(fls(t->tv_sec), NUM_BINS-1);
>
> sleep_time_bin[bin]++;
> - pr_info("Suspended for %lld.%03lu seconds\n", (s64)t->tv_sec,
> + printk_deferred(KERN_INFO "Suspended for %lld.%03lu seconds\n",
> + (s64)t->tv_sec,
> t->tv_nsec / NSEC_PER_MSEC);
> }
>
> --
> 2.11.1
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Regression in next with use printk_safe buffers in printk
2017-02-14 16:56 ` Sergey Senozhatsky
2017-02-14 17:03 ` Tony Lindgren
@ 2017-02-14 18:29 ` Peter Zijlstra
2017-02-15 4:49 ` Sergey Senozhatsky
1 sibling, 1 reply; 16+ messages in thread
From: Peter Zijlstra @ 2017-02-14 18:29 UTC (permalink / raw)
To: Sergey Senozhatsky
Cc: Tony Lindgren, Petr Mladek, Steven Rostedt, Thomas Gleixner,
linux-kernel, Sergey Senozhatsky, Rafael J. Wysocki, linux-pm,
John Stultz
On Wed, Feb 15, 2017 at 01:56:45AM +0900, Sergey Senozhatsky wrote:
> that crossed my mind, but I kinda assumed that we do printk() from
> under tk_core using sched fair, and rt_runtime_lock is from sched rt.
That's all true; lockdep doesn't care :-) All it knows is that at some
point those locks nest.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Regression in next with use printk_safe buffers in printk
2017-02-14 17:03 ` Tony Lindgren
@ 2017-02-15 4:44 ` Sergey Senozhatsky
0 siblings, 0 replies; 16+ messages in thread
From: Sergey Senozhatsky @ 2017-02-15 4:44 UTC (permalink / raw)
To: Tony Lindgren
Cc: Sergey Senozhatsky, Peter Zijlstra, Petr Mladek, Steven Rostedt,
Thomas Gleixner, linux-kernel, Sergey Senozhatsky,
Rafael J. Wysocki, linux-pm, John Stultz
On (02/14/17 09:03), Tony Lindgren wrote:
[..]
> > Do not call printk() from tk_debug_account_sleep_time(), because
> > tk_debug_account_sleep_time() is called under tk_core seq lock.
> > It's not safe to call printk() under tk_core, because console_sem
> > invokes scheduled (via wake_up_process()->activate_task()), which,
> > in turn, can call timekeeping code again, for instance, via
> > get_time()->ktime_get(). This may result in infinite loop on
> > tk_core.
> >
> > Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
>
> Thanks yeah this fixes the issue for me:
>
> Tested-by: Tony Lindgren <tony@atomide.com>
thanks.
-ss
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Regression in next with use printk_safe buffers in printk
2017-02-14 18:29 ` Peter Zijlstra
@ 2017-02-15 4:49 ` Sergey Senozhatsky
0 siblings, 0 replies; 16+ messages in thread
From: Sergey Senozhatsky @ 2017-02-15 4:49 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Sergey Senozhatsky, Tony Lindgren, Petr Mladek, Steven Rostedt,
Thomas Gleixner, linux-kernel, Sergey Senozhatsky,
Rafael J. Wysocki, linux-pm, John Stultz
On (02/14/17 19:29), Peter Zijlstra wrote:
> On Wed, Feb 15, 2017 at 01:56:45AM +0900, Sergey Senozhatsky wrote:
> > that crossed my mind, but I kinda assumed that we do printk() from
> > under tk_core using sched fair, and rt_runtime_lock is from sched rt.
>
> That's all true; lockdep doesn't care :-) All it knows is that at some
> point those locks nest.
thanks.
I think I'll get more familiar with the lockdep splats in
coming months :) but it's good (well, so far) that now we
keep lockdep enabled in printk.
-ss
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Regression in next with use printk_safe buffers in printk
2017-02-14 16:01 ` Sergey Senozhatsky
2017-02-14 16:18 ` Peter Zijlstra
2017-02-14 16:54 ` Tony Lindgren
@ 2017-02-15 18:01 ` Tony Lindgren
2017-02-16 1:31 ` Sergey Senozhatsky
2 siblings, 1 reply; 16+ messages in thread
From: Tony Lindgren @ 2017-02-15 18:01 UTC (permalink / raw)
To: Sergey Senozhatsky
Cc: Petr Mladek, Steven Rostedt, Peter Zijlstra, Thomas Gleixner,
linux-kernel, Sergey Senozhatsky, Rafael J. Wysocki, linux-pm,
Russell King
Hi,
* Sergey Senozhatsky <sergey.senozhatsky@gmail.com> [170214 08:03]:
> Hello,
>
> Cc Rafael, just in case
Found another issue on booting ARM devices, so adding Russell too.
> On (02/13/17 10:59), Tony Lindgren wrote:
> > Looks like commit f975237b7682 ("printk: use printk_safe buffers in
> > printk") causes "possible circular locking dependency detected " for
> > me on the first suspend.
>
> thanks for the report.
>
> > Reverting the following four patches in next makes it go away:
> >
> > d9c23523ed98 ("printk: drop call_console_drivers() unused param")
> > de6fcbdb68b2 ("printk: convert the rest to printk-safe")
> > 8b1742c9c207 ("printk: remove zap_locks() function")
> > f975237b7682 ("printk: use printk_safe buffers in printk")
>
>
> these patches basically just enable locked where it previously was
> forcibly turned off. no timekeeping/pm/sched/etc code was modified.
Below is another issue I noticed caused by commit f975237b7682 that
I noticed during booting.
Regards,
Tony
8< --------------------------
[ 2.581939] hw-breakpoint: Failed to enable monitor mode on CPU 0.
[ 2.591613] hw-breakpoint: CPU 0 failed to disable vector catch
[ 2.597686]
[ 2.597717] ===============================
[ 2.597717] [ ERR: suspicious RCU usage. ]
[ 2.597717] 4.10.0-rc8-next-20170215+ #119 Not tainted
[ 2.597717] -------------------------------
[ 2.597717] ./include/trace/events/printk.h:32 suspicious rcu_dereference_check() usage!
[ 2.597717]
[ 2.597717] other info that might help us debug this:
[ 2.597717]
[ 2.597717]
[ 2.597717] RCU used illegally from idle CPU!
[ 2.597717] rcu_scheduler_active = 2, debug_locks = 0
[ 2.597747] RCU used illegally from extended quiescent state!
[ 2.597747] 2 locks held by swapper/0/0:
[ 2.597747] #0: (cpu_pm_notifier_lock){......}, at: [<c0237e2c>] cpu_pm_exit+0x10/0x54
[ 2.597747] #1: (console_lock){+.+.+.}, at: [<c01ab350>] vprintk_emit+0x264/0x474
[ 2.597747]
[ 2.597747] stack backtrace:
[ 2.597778] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.10.0-rc8-next-20170215+ #119
[ 2.597778] Hardware name: Generic OMAP4 (Flattened Device Tree)
[ 2.597778] [<c0110228>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 2.597778] [<c010c224>] (show_stack) from [<c04ca8a0>] (dump_stack+0xac/0xe0)
[ 2.597778] [<c04ca8a0>] (dump_stack) from [<c01ab084>] (console_unlock+0x5e8/0x650)
[ 2.597778] [<c01ab084>] (console_unlock) from [<c01ab35c>] (vprintk_emit+0x270/0x474)
[ 2.597778] [<c01ab35c>] (vprintk_emit) from [<c01ab6f0>] (vprintk_default+0x20/0x28)
[ 2.597808] [<c01ab6f0>] (vprintk_default) from [<c0250c94>] (printk+0x20/0x30)
[ 2.597808] [<c0250c94>] (printk) from [<c0111004>] (reset_ctrl_regs+0x108/0x288)
[ 2.597808] [<c0111004>] (reset_ctrl_regs) from [<c0111220>] (dbg_cpu_pm_notify+0x28/0x30)
[ 2.597808] [<c0111220>] (dbg_cpu_pm_notify) from [<c015f3ac>] (notifier_call_chain+0x44/0x80)
[ 2.597808] [<c015f3ac>] (notifier_call_chain) from [<c0237e48>] (cpu_pm_exit+0x2c/0x54)
[ 2.597808] [<c0237e48>] (cpu_pm_exit) from [<c0126ca0>] (omap_enter_idle_coupled+0x80/0x208)
[ 2.597808] [<c0126ca0>] (omap_enter_idle_coupled) from [<c0680d80>] (cpuidle_enter_state+0x118/0x4ac)
[ 2.597808] [<c0680d80>] (cpuidle_enter_state) from [<c0682e54>] (cpuidle_enter_state_coupled+0x3a8/0x40c)
[ 2.597839] [<c0682e54>] (cpuidle_enter_state_coupled) from [<c0190c04>] (do_idle+0x1a4/0x218)
[ 2.597839] [<c0190c04>] (do_idle) from [<c0190ffc>] (cpu_startup_entry+0x18/0x1c)
[ 2.597839] [<c0190ffc>] (cpu_startup_entry) from [<c0c00c40>] (start_kernel+0x35c/0x3d4)
[ 2.597839] [<c0c00c40>] (start_kernel) from [<8000807c>] (0x8000807c)
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Regression in next with use printk_safe buffers in printk
2017-02-15 18:01 ` Tony Lindgren
@ 2017-02-16 1:31 ` Sergey Senozhatsky
2017-02-16 4:03 ` Tony Lindgren
0 siblings, 1 reply; 16+ messages in thread
From: Sergey Senozhatsky @ 2017-02-16 1:31 UTC (permalink / raw)
To: Tony Lindgren
Cc: Sergey Senozhatsky, Petr Mladek, Steven Rostedt, Peter Zijlstra,
Thomas Gleixner, linux-kernel, Sergey Senozhatsky,
Rafael J. Wysocki, linux-pm, Russell King
On (02/15/17 10:01), Tony Lindgren wrote:
[..]
> Below is another issue I noticed caused by commit f975237b7682 that
> I noticed during booting.
do you mean that with f975237b7682 you _always_ see that illegal RCU
usage warning?
> 8< --------------------------
> [ 2.581939] hw-breakpoint: Failed to enable monitor mode on CPU 0.
> [ 2.591613] hw-breakpoint: CPU 0 failed to disable vector catch
> [ 2.597686]
> [ 2.597717] ===============================
> [ 2.597717] [ ERR: suspicious RCU usage. ]
> [ 2.597717] 4.10.0-rc8-next-20170215+ #119 Not tainted
> [ 2.597717] -------------------------------
> [ 2.597717] ./include/trace/events/printk.h:32 suspicious rcu_dereference_check() usage!
so this is trace_console(text, len) from
cpuidle_enter_state()
...
vprintk_emit() << error: Failed to enable monitor mode/CPU 0 failed to disable...
console_unlock()
call_console_drivers()
trace_console()
> [ 2.597717]
> [ 2.597717] other info that might help us debug this:
> [ 2.597717]
> [ 2.597717]
> [ 2.597717] RCU used illegally from idle CPU!
um... is this what rcuidle() for?
---
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 7180088cbb23..34da86e73d00 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1563,7 +1563,7 @@ static void call_console_drivers(const char *ext_text, size_t ext_len,
{
struct console *con;
- trace_console(text, len);
+ trace_console_rcuidle(text, len);
if (!console_drivers)
return;
---
I need to look more at this.
-ss
> [ 2.597717] rcu_scheduler_active = 2, debug_locks = 0
> [ 2.597747] RCU used illegally from extended quiescent state!
> [ 2.597747] 2 locks held by swapper/0/0:
> [ 2.597747] #0: (cpu_pm_notifier_lock){......}, at: [<c0237e2c>] cpu_pm_exit+0x10/0x54
> [ 2.597747] #1: (console_lock){+.+.+.}, at: [<c01ab350>] vprintk_emit+0x264/0x474
> [ 2.597747]
> [ 2.597747] stack backtrace:
> [ 2.597778] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.10.0-rc8-next-20170215+ #119
> [ 2.597778] Hardware name: Generic OMAP4 (Flattened Device Tree)
> [ 2.597778] [<c0110228>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
> [ 2.597778] [<c010c224>] (show_stack) from [<c04ca8a0>] (dump_stack+0xac/0xe0)
> [ 2.597778] [<c04ca8a0>] (dump_stack) from [<c01ab084>] (console_unlock+0x5e8/0x650)
> [ 2.597778] [<c01ab084>] (console_unlock) from [<c01ab35c>] (vprintk_emit+0x270/0x474)
> [ 2.597778] [<c01ab35c>] (vprintk_emit) from [<c01ab6f0>] (vprintk_default+0x20/0x28)
> [ 2.597808] [<c01ab6f0>] (vprintk_default) from [<c0250c94>] (printk+0x20/0x30)
> [ 2.597808] [<c0250c94>] (printk) from [<c0111004>] (reset_ctrl_regs+0x108/0x288)
> [ 2.597808] [<c0111004>] (reset_ctrl_regs) from [<c0111220>] (dbg_cpu_pm_notify+0x28/0x30)
> [ 2.597808] [<c0111220>] (dbg_cpu_pm_notify) from [<c015f3ac>] (notifier_call_chain+0x44/0x80)
> [ 2.597808] [<c015f3ac>] (notifier_call_chain) from [<c0237e48>] (cpu_pm_exit+0x2c/0x54)
> [ 2.597808] [<c0237e48>] (cpu_pm_exit) from [<c0126ca0>] (omap_enter_idle_coupled+0x80/0x208)
> [ 2.597808] [<c0126ca0>] (omap_enter_idle_coupled) from [<c0680d80>] (cpuidle_enter_state+0x118/0x4ac)
> [ 2.597808] [<c0680d80>] (cpuidle_enter_state) from [<c0682e54>] (cpuidle_enter_state_coupled+0x3a8/0x40c)
> [ 2.597839] [<c0682e54>] (cpuidle_enter_state_coupled) from [<c0190c04>] (do_idle+0x1a4/0x218)
> [ 2.597839] [<c0190c04>] (do_idle) from [<c0190ffc>] (cpu_startup_entry+0x18/0x1c)
> [ 2.597839] [<c0190ffc>] (cpu_startup_entry) from [<c0c00c40>] (start_kernel+0x35c/0x3d4)
> [ 2.597839] [<c0c00c40>] (start_kernel) from [<8000807c>] (0x8000807c)
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: Regression in next with use printk_safe buffers in printk
2017-02-16 1:31 ` Sergey Senozhatsky
@ 2017-02-16 4:03 ` Tony Lindgren
2017-02-16 4:25 ` Sergey Senozhatsky
0 siblings, 1 reply; 16+ messages in thread
From: Tony Lindgren @ 2017-02-16 4:03 UTC (permalink / raw)
To: Sergey Senozhatsky
Cc: Sergey Senozhatsky, Petr Mladek, Steven Rostedt, Peter Zijlstra,
Thomas Gleixner, linux-kernel, Rafael J. Wysocki, linux-pm,
Russell King
* Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> [170215 17:32]:
> On (02/15/17 10:01), Tony Lindgren wrote:
> [..]
> > Below is another issue I noticed caused by commit f975237b7682 that
> > I noticed during booting.
>
> do you mean that with f975237b7682 you _always_ see that illegal RCU
> usage warning?
Yeah on every boot on devices using cpuidle_coupled.
> > 8< --------------------------
> > [ 2.581939] hw-breakpoint: Failed to enable monitor mode on CPU 0.
> > [ 2.591613] hw-breakpoint: CPU 0 failed to disable vector catch
> > [ 2.597686]
> > [ 2.597717] ===============================
> > [ 2.597717] [ ERR: suspicious RCU usage. ]
> > [ 2.597717] 4.10.0-rc8-next-20170215+ #119 Not tainted
> > [ 2.597717] -------------------------------
> > [ 2.597717] ./include/trace/events/printk.h:32 suspicious rcu_dereference_check() usage!
>
> so this is trace_console(text, len) from
>
> cpuidle_enter_state()
> ...
> vprintk_emit() << error: Failed to enable monitor mode/CPU 0 failed to disable...
> console_unlock()
> call_console_drivers()
> trace_console()
>
>
> > [ 2.597717]
> > [ 2.597717] other info that might help us debug this:
> > [ 2.597717]
> > [ 2.597717]
> > [ 2.597717] RCU used illegally from idle CPU!
>
> um... is this what rcuidle() for?
Yeah I think that's all there is to it. We've had a bunch of similar
issues pop up over past year :) See for example:
$ git log --author=McKenney drivers/base/power
> ---
>
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 7180088cbb23..34da86e73d00 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -1563,7 +1563,7 @@ static void call_console_drivers(const char *ext_text, size_t ext_len,
> {
> struct console *con;
>
> - trace_console(text, len);
> + trace_console_rcuidle(text, len);
>
> if (!console_drivers)
> return;
>
> ---
>
>
> I need to look more at this.
That fixes it for me thanks, so feel free to add:
Tested-by: Tony Lindgren <tony@atomide.com>
> > [ 2.597717] rcu_scheduler_active = 2, debug_locks = 0
> > [ 2.597747] RCU used illegally from extended quiescent state!
>
>
> > [ 2.597747] 2 locks held by swapper/0/0:
> > [ 2.597747] #0: (cpu_pm_notifier_lock){......}, at: [<c0237e2c>] cpu_pm_exit+0x10/0x54
> > [ 2.597747] #1: (console_lock){+.+.+.}, at: [<c01ab350>] vprintk_emit+0x264/0x474
> > [ 2.597747]
> > [ 2.597747] stack backtrace:
> > [ 2.597778] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.10.0-rc8-next-20170215+ #119
> > [ 2.597778] Hardware name: Generic OMAP4 (Flattened Device Tree)
> > [ 2.597778] [<c0110228>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
> > [ 2.597778] [<c010c224>] (show_stack) from [<c04ca8a0>] (dump_stack+0xac/0xe0)
> > [ 2.597778] [<c04ca8a0>] (dump_stack) from [<c01ab084>] (console_unlock+0x5e8/0x650)
> > [ 2.597778] [<c01ab084>] (console_unlock) from [<c01ab35c>] (vprintk_emit+0x270/0x474)
> > [ 2.597778] [<c01ab35c>] (vprintk_emit) from [<c01ab6f0>] (vprintk_default+0x20/0x28)
> > [ 2.597808] [<c01ab6f0>] (vprintk_default) from [<c0250c94>] (printk+0x20/0x30)
> > [ 2.597808] [<c0250c94>] (printk) from [<c0111004>] (reset_ctrl_regs+0x108/0x288)
> > [ 2.597808] [<c0111004>] (reset_ctrl_regs) from [<c0111220>] (dbg_cpu_pm_notify+0x28/0x30)
> > [ 2.597808] [<c0111220>] (dbg_cpu_pm_notify) from [<c015f3ac>] (notifier_call_chain+0x44/0x80)
> > [ 2.597808] [<c015f3ac>] (notifier_call_chain) from [<c0237e48>] (cpu_pm_exit+0x2c/0x54)
> > [ 2.597808] [<c0237e48>] (cpu_pm_exit) from [<c0126ca0>] (omap_enter_idle_coupled+0x80/0x208)
> > [ 2.597808] [<c0126ca0>] (omap_enter_idle_coupled) from [<c0680d80>] (cpuidle_enter_state+0x118/0x4ac)
> > [ 2.597808] [<c0680d80>] (cpuidle_enter_state) from [<c0682e54>] (cpuidle_enter_state_coupled+0x3a8/0x40c)
> > [ 2.597839] [<c0682e54>] (cpuidle_enter_state_coupled) from [<c0190c04>] (do_idle+0x1a4/0x218)
> > [ 2.597839] [<c0190c04>] (do_idle) from [<c0190ffc>] (cpu_startup_entry+0x18/0x1c)
> > [ 2.597839] [<c0190ffc>] (cpu_startup_entry) from [<c0c00c40>] (start_kernel+0x35c/0x3d4)
> > [ 2.597839] [<c0c00c40>] (start_kernel) from [<8000807c>] (0x8000807c)
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Regression in next with use printk_safe buffers in printk
2017-02-16 4:03 ` Tony Lindgren
@ 2017-02-16 4:25 ` Sergey Senozhatsky
2017-02-16 15:10 ` Tony Lindgren
0 siblings, 1 reply; 16+ messages in thread
From: Sergey Senozhatsky @ 2017-02-16 4:25 UTC (permalink / raw)
To: Tony Lindgren
Cc: Sergey Senozhatsky, Sergey Senozhatsky, Petr Mladek,
Steven Rostedt, Peter Zijlstra, Thomas Gleixner, linux-kernel,
Rafael J. Wysocki, linux-pm, Russell King
On (02/15/17 20:03), Tony Lindgren wrote:
> * Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> [170215 17:32]:
> > On (02/15/17 10:01), Tony Lindgren wrote:
> > [..]
> > > Below is another issue I noticed caused by commit f975237b7682 that
> > > I noticed during booting.
> >
> > do you mean that with f975237b7682 you _always_ see that illegal RCU
> > usage warning?
>
> Yeah on every boot on devices using cpuidle_coupled.
does this mean that with the printk-safe patches reverted
(so, basically, the same conditions module 4 printk patches)
you don't see illegal RCU usage reports? at the moment I can't
see any connection between f975237b7682 and RCU usage from idle CPU.
[..]
> > ---
> >
> > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> > index 7180088cbb23..34da86e73d00 100644
> > --- a/kernel/printk/printk.c
> > +++ b/kernel/printk/printk.c
> > @@ -1563,7 +1563,7 @@ static void call_console_drivers(const char *ext_text, size_t ext_len,
> > {
> > struct console *con;
> >
> > - trace_console(text, len);
> > + trace_console_rcuidle(text, len);
> >
> > if (!console_drivers)
> > return;
> >
> > ---
> >
> >
> > I need to look more at this.
>
> That fixes it for me thanks, so feel free to add:
>
> Tested-by: Tony Lindgren <tony@atomide.com>
thanks.
-ss
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Regression in next with use printk_safe buffers in printk
2017-02-16 4:25 ` Sergey Senozhatsky
@ 2017-02-16 15:10 ` Tony Lindgren
2017-02-16 16:31 ` Sergey Senozhatsky
0 siblings, 1 reply; 16+ messages in thread
From: Tony Lindgren @ 2017-02-16 15:10 UTC (permalink / raw)
To: Sergey Senozhatsky
Cc: Sergey Senozhatsky, Petr Mladek, Steven Rostedt, Peter Zijlstra,
Thomas Gleixner, linux-kernel, Rafael J. Wysocki, linux-pm,
Russell King
* Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> [170215 20:26]:
> On (02/15/17 20:03), Tony Lindgren wrote:
> > * Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> [170215 17:32]:
> > > On (02/15/17 10:01), Tony Lindgren wrote:
> > > [..]
> > > > Below is another issue I noticed caused by commit f975237b7682 that
> > > > I noticed during booting.
> > >
> > > do you mean that with f975237b7682 you _always_ see that illegal RCU
> > > usage warning?
> >
> > Yeah on every boot on devices using cpuidle_coupled.
>
> does this mean that with the printk-safe patches reverted
> (so, basically, the same conditions module 4 printk patches)
> you don't see illegal RCU usage reports? at the moment I can't
> see any connection between f975237b7682 and RCU usage from idle CPU.
Yes reverting those four patches I listed earlier also makes it go
away.
Regards,
Tony
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Regression in next with use printk_safe buffers in printk
2017-02-16 15:10 ` Tony Lindgren
@ 2017-02-16 16:31 ` Sergey Senozhatsky
2017-02-16 19:13 ` Tony Lindgren
0 siblings, 1 reply; 16+ messages in thread
From: Sergey Senozhatsky @ 2017-02-16 16:31 UTC (permalink / raw)
To: Tony Lindgren
Cc: Sergey Senozhatsky, Sergey Senozhatsky, Petr Mladek,
Steven Rostedt, Peter Zijlstra, Thomas Gleixner, linux-kernel,
Rafael J. Wysocki, linux-pm, Russell King
On (02/16/17 07:10), Tony Lindgren wrote:
[..]
> > > > [..]
> > > > > Below is another issue I noticed caused by commit f975237b7682 that
> > > > > I noticed during booting.
> > > >
> > > > do you mean that with f975237b7682 you _always_ see that illegal RCU
> > > > usage warning?
> > >
> > > Yeah on every boot on devices using cpuidle_coupled.
> >
> > does this mean that with the printk-safe patches reverted
> > (so, basically, the same conditions module 4 printk patches)
> > you don't see illegal RCU usage reports? at the moment I can't
> > see any connection between f975237b7682 and RCU usage from idle CPU.
>
> Yes reverting those four patches I listed earlier also makes it go
> away.
aha... so, the previous RCU warning was simply suppressed by lockdep_off()
that we used to have in printk().
RCU dereference check
#define __rcu_dereference_check(p, c, space) \
({ \
/* Dependency order vs. p above. */ \
typeof(*p) *________p1 = (typeof(*p) *__force)lockless_dereference(p); \
RCU_LOCKDEP_WARN(!(c), "suspicious rcu_dereference_check() usage"); \
rcu_dereference_sparse(p, space); \
((typeof(*p) __force __kernel *)(________p1)); \
})
where RCU_LOCKDEP_WARN() that prints "suspicious rcu_dereference_check() usage"
is
#define RCU_LOCKDEP_WARN(c, s) \
do { \
static bool __section(.data.unlikely) __warned; \
if (debug_lockdep_rcu_enabled() && !__warned && (c)) { \
__warned = true; \
lockdep_rcu_suspicious(__FILE__, __LINE__, s); \
} \
} while (0)
where debug_lockdep_rcu_enabled()
int notrace debug_lockdep_rcu_enabled(void)
{
return rcu_scheduler_active != RCU_SCHEDULER_INACTIVE && debug_locks &&
current->lockdep_recursion == 0;
}
depends on lockdep state. and we just used to have
'current->lockdep_recursion != 0' here, because of lockdep_off()
in printk() around console_unlock(), which increments ->lockdep_recursion.
now we have lockdep enabled and the ->lockdep_recursion == 0.
so the RCU warning is valid and I need to Cc stable on that _rcuidle
patch, the tracepoint is pretty old. it's from 3.4
-ss
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Regression in next with use printk_safe buffers in printk
2017-02-16 16:31 ` Sergey Senozhatsky
@ 2017-02-16 19:13 ` Tony Lindgren
0 siblings, 0 replies; 16+ messages in thread
From: Tony Lindgren @ 2017-02-16 19:13 UTC (permalink / raw)
To: Sergey Senozhatsky
Cc: Sergey Senozhatsky, Petr Mladek, Steven Rostedt, Peter Zijlstra,
Thomas Gleixner, linux-kernel, Rafael J. Wysocki, linux-pm,
Russell King
* Sergey Senozhatsky <sergey.senozhatsky@gmail.com> [170216 08:33]:
> On (02/16/17 07:10), Tony Lindgren wrote:
> [..]
> > > > > [..]
> > > > > > Below is another issue I noticed caused by commit f975237b7682 that
> > > > > > I noticed during booting.
> > > > >
> > > > > do you mean that with f975237b7682 you _always_ see that illegal RCU
> > > > > usage warning?
> > > >
> > > > Yeah on every boot on devices using cpuidle_coupled.
> > >
> > > does this mean that with the printk-safe patches reverted
> > > (so, basically, the same conditions module 4 printk patches)
> > > you don't see illegal RCU usage reports? at the moment I can't
> > > see any connection between f975237b7682 and RCU usage from idle CPU.
> >
> > Yes reverting those four patches I listed earlier also makes it go
> > away.
>
> aha... so, the previous RCU warning was simply suppressed by lockdep_off()
> that we used to have in printk().
>
>
> RCU dereference check
>
> #define __rcu_dereference_check(p, c, space) \
> ({ \
> /* Dependency order vs. p above. */ \
> typeof(*p) *________p1 = (typeof(*p) *__force)lockless_dereference(p); \
> RCU_LOCKDEP_WARN(!(c), "suspicious rcu_dereference_check() usage"); \
> rcu_dereference_sparse(p, space); \
> ((typeof(*p) __force __kernel *)(________p1)); \
> })
>
>
> where RCU_LOCKDEP_WARN() that prints "suspicious rcu_dereference_check() usage"
> is
>
>
> #define RCU_LOCKDEP_WARN(c, s) \
> do { \
> static bool __section(.data.unlikely) __warned; \
> if (debug_lockdep_rcu_enabled() && !__warned && (c)) { \
> __warned = true; \
> lockdep_rcu_suspicious(__FILE__, __LINE__, s); \
> } \
> } while (0)
>
>
>
> where debug_lockdep_rcu_enabled()
>
> int notrace debug_lockdep_rcu_enabled(void)
> {
> return rcu_scheduler_active != RCU_SCHEDULER_INACTIVE && debug_locks &&
> current->lockdep_recursion == 0;
> }
>
> depends on lockdep state. and we just used to have
> 'current->lockdep_recursion != 0' here, because of lockdep_off()
> in printk() around console_unlock(), which increments ->lockdep_recursion.
>
> now we have lockdep enabled and the ->lockdep_recursion == 0.
>
>
> so the RCU warning is valid and I need to Cc stable on that _rcuidle
> patch, the tracepoint is pretty old. it's from 3.4
OK thanks for checking why it changed.
Regards,
Tony
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2017-02-16 19:13 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-13 18:59 Regression in next with use printk_safe buffers in printk Tony Lindgren
2017-02-14 16:01 ` Sergey Senozhatsky
2017-02-14 16:18 ` Peter Zijlstra
2017-02-14 16:56 ` Sergey Senozhatsky
2017-02-14 17:03 ` Tony Lindgren
2017-02-15 4:44 ` Sergey Senozhatsky
2017-02-14 18:29 ` Peter Zijlstra
2017-02-15 4:49 ` Sergey Senozhatsky
2017-02-14 16:54 ` Tony Lindgren
2017-02-15 18:01 ` Tony Lindgren
2017-02-16 1:31 ` Sergey Senozhatsky
2017-02-16 4:03 ` Tony Lindgren
2017-02-16 4:25 ` Sergey Senozhatsky
2017-02-16 15:10 ` Tony Lindgren
2017-02-16 16:31 ` Sergey Senozhatsky
2017-02-16 19:13 ` Tony Lindgren
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.