* Regression in next with use printk_safe buffers in printk @ 2017-02-13 18:59 Tony Lindgren 2017-02-14 16:01 ` Sergey Senozhatsky 0 siblings, 1 reply; 16+ messages in thread From: Tony Lindgren @ 2017-02-13 18:59 UTC (permalink / raw) To: Sergey Senozhatsky, Petr Mladek, Steven Rostedt Cc: Peter Zijlstra, Thomas Gleixner, linux-kernel Hi all, Looks like commit f975237b7682 ("printk: use printk_safe buffers in printk") causes "possible circular locking dependency detected " for me on the first suspend. Reverting the following four patches in next makes it go away: d9c23523ed98 ("printk: drop call_console_drivers() unused param") de6fcbdb68b2 ("printk: convert the rest to printk-safe") 8b1742c9c207 ("printk: remove zap_locks() function") f975237b7682 ("printk: use printk_safe buffers in printk") See below for the warning, #1 seems to be the relevant one as pointed out by Peter. Regards, Tony 8< --------------------- [ 48.950592] ====================================================== [ 48.950622] [ INFO: possible circular locking dependency detected ] [ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted [ 48.950622] ------------------------------------------------------- [ 48.950622] kworker/0:0/3 is trying to acquire lock: [ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90 [ 48.950683] but task is already holding lock: [ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90 [ 48.950714] which lock already depends on the new lock. [ 48.950714] the existing dependency chain (in reverse order) is: [ 48.950714] -> #5 (hrtimer_bases.lock){-.-...}: [ 48.950744] _raw_spin_lock_irqsave+0x50/0x64 [ 48.950775] lock_hrtimer_base+0x28/0x58 [ 48.950775] hrtimer_start_range_ns+0x20/0x5c8 [ 48.950775] __enqueue_rt_entity+0x320/0x360 [ 48.950805] enqueue_rt_entity+0x2c/0x44 [ 48.950805] enqueue_task_rt+0x24/0x94 [ 48.950836] ttwu_do_activate+0x54/0xc0 [ 48.950836] try_to_wake_up+0x248/0x5c8 [ 48.950836] __setup_irq+0x420/0x5f0 [ 48.950836] request_threaded_irq+0xdc/0x184 [ 48.950866] devm_request_threaded_irq+0x58/0xa4 [ 48.950866] omap_i2c_probe+0x530/0x6a0 [ 48.950897] platform_drv_probe+0x50/0xb0 [ 48.950897] driver_probe_device+0x1f8/0x2cc [ 48.950897] __driver_attach+0xc0/0xc4 [ 48.950927] bus_for_each_dev+0x6c/0xa0 [ 48.950927] bus_add_driver+0x100/0x210 [ 48.950927] driver_register+0x78/0xf4 [ 48.950958] do_one_initcall+0x3c/0x16c [ 48.950958] kernel_init_freeable+0x20c/0x2d8 [ 48.950958] kernel_init+0x8/0x110 [ 48.950988] ret_from_fork+0x14/0x24 [ 48.950988] -> #4 (&rt_b->rt_runtime_lock){-.-...}: [ 48.951019] _raw_spin_lock+0x40/0x50 [ 48.951019] rq_offline_rt+0x9c/0x2bc [ 48.951019] set_rq_offline.part.2+0x2c/0x58 [ 48.951049] rq_attach_root+0x134/0x144 [ 48.951049] cpu_attach_domain+0x18c/0x6f4 [ 48.951049] build_sched_domains+0xba4/0xd80 [ 48.951080] sched_init_smp+0x68/0x10c [ 48.951080] kernel_init_freeable+0x160/0x2d8 [ 48.951080] kernel_init+0x8/0x110 [ 48.951080] ret_from_fork+0x14/0x24 [ 48.951110] -> #3 (&rq->lock){-.-.-.}: [ 48.951110] _raw_spin_lock+0x40/0x50 [ 48.951141] task_fork_fair+0x30/0x124 [ 48.951141] sched_fork+0x194/0x2e0 [ 48.951141] copy_process.part.5+0x448/0x1a20 [ 48.951171] _do_fork+0x98/0x7e8 [ 48.951171] kernel_thread+0x2c/0x34 [ 48.951171] rest_init+0x1c/0x18c [ 48.951202] start_kernel+0x35c/0x3d4 [ 48.951202] 0x8000807c [ 48.951202] -> #2 (&p->pi_lock){-.-.-.}: [ 48.951232] _raw_spin_lock_irqsave+0x50/0x64 [ 48.951232] try_to_wake_up+0x30/0x5c8 [ 48.951232] up+0x4c/0x60 [ 48.951263] __up_console_sem+0x2c/0x58 [ 48.951263] console_unlock+0x3b4/0x650 [ 48.951263] vprintk_emit+0x270/0x474 [ 48.951293] vprintk_default+0x20/0x28 [ 48.951293] printk+0x20/0x30 [ 48.951324] kauditd_hold_skb+0x94/0xb8 [ 48.951324] kauditd_thread+0x1a4/0x56c [ 48.951324] kthread+0x104/0x148 [ 48.951354] ret_from_fork+0x14/0x24 [ 48.951354] -> #1 ((console_sem).lock){-.....}: [ 48.951385] _raw_spin_lock_irqsave+0x50/0x64 [ 48.951385] down_trylock+0xc/0x2c [ 48.951385] __down_trylock_console_sem+0x24/0x80 [ 48.951385] console_trylock+0x10/0x8c [ 48.951416] vprintk_emit+0x264/0x474 [ 48.951416] vprintk_default+0x20/0x28 [ 48.951416] printk+0x20/0x30 [ 48.951446] tk_debug_account_sleep_time+0x5c/0x70 [ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0 [ 48.951446] timekeeping_resume+0x218/0x23c [ 48.951477] syscore_resume+0x94/0x42c [ 48.951477] suspend_enter+0x554/0x9b4 [ 48.951477] suspend_devices_and_enter+0xd8/0x4b4 [ 48.951507] enter_state+0x934/0xbd4 [ 48.951507] pm_suspend+0x14/0x70 [ 48.951507] state_store+0x68/0xc8 [ 48.951538] kernfs_fop_write+0xf4/0x1f8 [ 48.951538] __vfs_write+0x1c/0x114 [ 48.951538] vfs_write+0xa0/0x168 [ 48.951568] SyS_write+0x3c/0x90 [ 48.951568] __sys_trace_return+0x0/0x10 [ 48.951568] -> #0 (tk_core){----..}: [ 48.951599] lock_acquire+0xe0/0x294 [ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4 [ 48.951629] retrigger_next_event+0x4c/0x90 [ 48.951629] on_each_cpu+0x40/0x7c [ 48.951629] clock_was_set_work+0x14/0x20 [ 48.951660] process_one_work+0x2b4/0x808 [ 48.951660] worker_thread+0x3c/0x550 [ 48.951660] kthread+0x104/0x148 [ 48.951690] ret_from_fork+0x14/0x24 [ 48.951690] other info that might help us debug this: [ 48.951690] Chain exists of: tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock [ 48.951721] Possible unsafe locking scenario: [ 48.951721] CPU0 CPU1 [ 48.951721] ---- ---- [ 48.951721] lock(hrtimer_bases.lock); [ 48.951751] lock(&rt_b->rt_runtime_lock); [ 48.951751] lock(hrtimer_bases.lock); [ 48.951751] lock(tk_core); [ 48.951782] *** DEADLOCK *** [ 48.951782] 3 locks held by kworker/0:0/3: [ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808 [ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808 [ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90 [ 48.951843] stack backtrace: [ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+ #101 [ 48.951873] Hardware name: Generic OMAP36xx (Flattened Device Tree) [ 48.951904] Workqueue: events clock_was_set_work [ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14) [ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0) [ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308) [ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324) [ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8) [ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294) [ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4) [ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90) [ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c) [ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20) [ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808) [ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550) [ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148) [ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24) [ 48.952911] PM: noirq resume of devices complete after 33.355 msecs ... ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Regression in next with use printk_safe buffers in printk 2017-02-13 18:59 Regression in next with use printk_safe buffers in printk Tony Lindgren @ 2017-02-14 16:01 ` Sergey Senozhatsky 2017-02-14 16:18 ` Peter Zijlstra ` (2 more replies) 0 siblings, 3 replies; 16+ messages in thread From: Sergey Senozhatsky @ 2017-02-14 16:01 UTC (permalink / raw) To: Tony Lindgren Cc: Sergey Senozhatsky, Petr Mladek, Steven Rostedt, Peter Zijlstra, Thomas Gleixner, linux-kernel, Sergey Senozhatsky, Rafael J. Wysocki, linux-pm Hello, Cc Rafael, just in case On (02/13/17 10:59), Tony Lindgren wrote: > Looks like commit f975237b7682 ("printk: use printk_safe buffers in > printk") causes "possible circular locking dependency detected " for > me on the first suspend. thanks for the report. > Reverting the following four patches in next makes it go away: > > d9c23523ed98 ("printk: drop call_console_drivers() unused param") > de6fcbdb68b2 ("printk: convert the rest to printk-safe") > 8b1742c9c207 ("printk: remove zap_locks() function") > f975237b7682 ("printk: use printk_safe buffers in printk") these patches basically just enable locked where it previously was forcibly turned off. no timekeeping/pm/sched/etc code was modified. can you share the link where Peter pointed out that this might be caused by printk() changes? so the report is > kworker/0:0/3 is trying to acquire lock: > (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90 > > but task is already holding lock: > > (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90 > > which lock already depends on the new lock. some thoughts. so there is a: BAR -> try_to_wake_up()-> lock_hrtimer_base() ##hrtimer_bases.lock chain (from #5). and there is a: FOO -> printk()->try_to_wake_up() chain (from #2) note, that printk() patches that you have mentioned don't add anything new here. the call chain existed before, we just keep lockdep enabled now. what #1 does is timekeeping_resume() lock timekeeper_lock lock tk_core tk_debug_account_sleep_time() printk() << lockdep was disabled here before try_to_wake_up() lock_hrtimer_base() ##hrtimer_bases.lock so it's tk_core -> hrtimer_bases.lock and retrigger_next_event() does retrigger_next_event() lock hrtimer_bases ktime_get_update_offsets_now() lock tk_core which is hrtimer_bases.lock -> tk_core but I'm a bit confused by rt_b->rt_runtime_lock in this unsafe lock scenario (so it's not ABBA, but ABAD) > lock(hrtimer_bases.lock); > lock(&rt_b->rt_runtime_lock); > lock(hrtimer_bases.lock); > lock(tk_core); > > > Chain exists of: > > tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock I'm lacking some knowledge here, sorry. where does the tk_core --> &rt_b->rt_runtime_lock come from? By the way, as a side note, shouldn't tk_debug_account_sleep_time() do printk_deferred() instead of 'normal' printk()? printk() calls from under timekeeping seqlock are not safe, aren't they? and tk_debug_account_sleep_time() is under tk_core seq lock. IOW, replace pr_info() in tk_debug_account_sleep_time() with something like this printk_deferred(KERN_INFO "Suspended for %lld.%03lu seconds\n", (s64)t->tv_sec, t->tv_nsec / NSEC_PER_MSEC); --- diff --git a/kernel/time/timekeeping_debug.c b/kernel/time/timekeeping_debug.c index ca9fb800336b..b8f7146c3538 100644 --- a/kernel/time/timekeeping_debug.c +++ b/kernel/time/timekeeping_debug.c @@ -75,7 +75,8 @@ void tk_debug_account_sleep_time(struct timespec64 *t) int bin = min(fls(t->tv_sec), NUM_BINS-1); sleep_time_bin[bin]++; - pr_info("Suspended for %lld.%03lu seconds\n", (s64)t->tv_sec, + printk_deferred(KERN_INFO "Suspended for %lld.%03lu seconds\n", + (s64)t->tv_sec, t->tv_nsec / NSEC_PER_MSEC); } --- I'll keep looking at the report more in the meantime. -ss > 8< --------------------- > [ 48.950592] ====================================================== > [ 48.950622] [ INFO: possible circular locking dependency detected ] > [ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted > [ 48.950622] ------------------------------------------------------- > [ 48.950622] kworker/0:0/3 is trying to acquire lock: > [ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90 > [ 48.950683] > but task is already holding lock: > [ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90 > [ 48.950714] > which lock already depends on the new lock. > > [ 48.950714] > the existing dependency chain (in reverse order) is: > [ 48.950714] > -> #5 (hrtimer_bases.lock){-.-...}: > [ 48.950744] _raw_spin_lock_irqsave+0x50/0x64 > [ 48.950775] lock_hrtimer_base+0x28/0x58 > [ 48.950775] hrtimer_start_range_ns+0x20/0x5c8 > [ 48.950775] __enqueue_rt_entity+0x320/0x360 > [ 48.950805] enqueue_rt_entity+0x2c/0x44 > [ 48.950805] enqueue_task_rt+0x24/0x94 > [ 48.950836] ttwu_do_activate+0x54/0xc0 > [ 48.950836] try_to_wake_up+0x248/0x5c8 > [ 48.950836] __setup_irq+0x420/0x5f0 > [ 48.950836] request_threaded_irq+0xdc/0x184 > [ 48.950866] devm_request_threaded_irq+0x58/0xa4 > [ 48.950866] omap_i2c_probe+0x530/0x6a0 > [ 48.950897] platform_drv_probe+0x50/0xb0 > [ 48.950897] driver_probe_device+0x1f8/0x2cc > [ 48.950897] __driver_attach+0xc0/0xc4 > [ 48.950927] bus_for_each_dev+0x6c/0xa0 > [ 48.950927] bus_add_driver+0x100/0x210 > [ 48.950927] driver_register+0x78/0xf4 > [ 48.950958] do_one_initcall+0x3c/0x16c > [ 48.950958] kernel_init_freeable+0x20c/0x2d8 > [ 48.950958] kernel_init+0x8/0x110 > [ 48.950988] ret_from_fork+0x14/0x24 > [ 48.950988] > -> #4 (&rt_b->rt_runtime_lock){-.-...}: > [ 48.951019] _raw_spin_lock+0x40/0x50 > [ 48.951019] rq_offline_rt+0x9c/0x2bc > [ 48.951019] set_rq_offline.part.2+0x2c/0x58 > [ 48.951049] rq_attach_root+0x134/0x144 > [ 48.951049] cpu_attach_domain+0x18c/0x6f4 > [ 48.951049] build_sched_domains+0xba4/0xd80 > [ 48.951080] sched_init_smp+0x68/0x10c > [ 48.951080] kernel_init_freeable+0x160/0x2d8 > [ 48.951080] kernel_init+0x8/0x110 > [ 48.951080] ret_from_fork+0x14/0x24 > [ 48.951110] > -> #3 (&rq->lock){-.-.-.}: > [ 48.951110] _raw_spin_lock+0x40/0x50 > [ 48.951141] task_fork_fair+0x30/0x124 > [ 48.951141] sched_fork+0x194/0x2e0 > [ 48.951141] copy_process.part.5+0x448/0x1a20 > [ 48.951171] _do_fork+0x98/0x7e8 > [ 48.951171] kernel_thread+0x2c/0x34 > [ 48.951171] rest_init+0x1c/0x18c > [ 48.951202] start_kernel+0x35c/0x3d4 > [ 48.951202] 0x8000807c > [ 48.951202] > -> #2 (&p->pi_lock){-.-.-.}: > [ 48.951232] _raw_spin_lock_irqsave+0x50/0x64 > [ 48.951232] try_to_wake_up+0x30/0x5c8 > [ 48.951232] up+0x4c/0x60 > [ 48.951263] __up_console_sem+0x2c/0x58 > [ 48.951263] console_unlock+0x3b4/0x650 > [ 48.951263] vprintk_emit+0x270/0x474 > [ 48.951293] vprintk_default+0x20/0x28 > [ 48.951293] printk+0x20/0x30 > [ 48.951324] kauditd_hold_skb+0x94/0xb8 > [ 48.951324] kauditd_thread+0x1a4/0x56c > [ 48.951324] kthread+0x104/0x148 > [ 48.951354] ret_from_fork+0x14/0x24 > [ 48.951354] > -> #1 ((console_sem).lock){-.....}: > [ 48.951385] _raw_spin_lock_irqsave+0x50/0x64 > [ 48.951385] down_trylock+0xc/0x2c > [ 48.951385] __down_trylock_console_sem+0x24/0x80 > [ 48.951385] console_trylock+0x10/0x8c > [ 48.951416] vprintk_emit+0x264/0x474 > [ 48.951416] vprintk_default+0x20/0x28 > [ 48.951416] printk+0x20/0x30 > [ 48.951446] tk_debug_account_sleep_time+0x5c/0x70 > [ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0 > [ 48.951446] timekeeping_resume+0x218/0x23c > [ 48.951477] syscore_resume+0x94/0x42c > [ 48.951477] suspend_enter+0x554/0x9b4 > [ 48.951477] suspend_devices_and_enter+0xd8/0x4b4 > [ 48.951507] enter_state+0x934/0xbd4 > [ 48.951507] pm_suspend+0x14/0x70 > [ 48.951507] state_store+0x68/0xc8 > [ 48.951538] kernfs_fop_write+0xf4/0x1f8 > [ 48.951538] __vfs_write+0x1c/0x114 > [ 48.951538] vfs_write+0xa0/0x168 > [ 48.951568] SyS_write+0x3c/0x90 > [ 48.951568] __sys_trace_return+0x0/0x10 > [ 48.951568] > -> #0 (tk_core){----..}: > [ 48.951599] lock_acquire+0xe0/0x294 > [ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4 > [ 48.951629] retrigger_next_event+0x4c/0x90 > [ 48.951629] on_each_cpu+0x40/0x7c > [ 48.951629] clock_was_set_work+0x14/0x20 > [ 48.951660] process_one_work+0x2b4/0x808 > [ 48.951660] worker_thread+0x3c/0x550 > [ 48.951660] kthread+0x104/0x148 > [ 48.951690] ret_from_fork+0x14/0x24 > [ 48.951690] > other info that might help us debug this: > > [ 48.951690] Chain exists of: > tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock > > [ 48.951721] Possible unsafe locking scenario: > > [ 48.951721] CPU0 CPU1 > [ 48.951721] ---- ---- > [ 48.951721] lock(hrtimer_bases.lock); > [ 48.951751] lock(&rt_b->rt_runtime_lock); > [ 48.951751] lock(hrtimer_bases.lock); > [ 48.951751] lock(tk_core); > [ 48.951782] > *** DEADLOCK *** > > [ 48.951782] 3 locks held by kworker/0:0/3: > [ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808 > [ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808 > [ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90 > [ 48.951843] > stack backtrace: > [ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+ #101 > [ 48.951873] Hardware name: Generic OMAP36xx (Flattened Device Tree) > [ 48.951904] Workqueue: events clock_was_set_work > [ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14) > [ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0) > [ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308) > [ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324) > [ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8) > [ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294) > [ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4) > [ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90) > [ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c) > [ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20) > [ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808) > [ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550) > [ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148) > [ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24) > [ 48.952911] PM: noirq resume of devices complete after 33.355 msecs > ... > ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: Regression in next with use printk_safe buffers in printk 2017-02-14 16:01 ` Sergey Senozhatsky @ 2017-02-14 16:18 ` Peter Zijlstra 2017-02-14 16:56 ` Sergey Senozhatsky 2017-02-14 16:54 ` Tony Lindgren 2017-02-15 18:01 ` Tony Lindgren 2 siblings, 1 reply; 16+ messages in thread From: Peter Zijlstra @ 2017-02-14 16:18 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Tony Lindgren, Petr Mladek, Steven Rostedt, Thomas Gleixner, linux-kernel, Sergey Senozhatsky, Rafael J. Wysocki, linux-pm On Wed, Feb 15, 2017 at 01:01:40AM +0900, Sergey Senozhatsky wrote: > > but I'm a bit confused by rt_b->rt_runtime_lock in this unsafe lock > scenario (so it's not ABBA, but ABAD) > > > lock(hrtimer_bases.lock); > > lock(&rt_b->rt_runtime_lock); > > lock(hrtimer_bases.lock); > > lock(tk_core); > > > > > > Chain exists of: > > > > tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock > > > I'm lacking some knowledge here, sorry. where does the tk_core --> &rt_b->rt_runtime_lock > come from? rt_b->rt_runtime_lock is one of the scheduler locks, since we do printk() under tk_core, which does semaphore muck, which then includes the entire scheduler chain of locks. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Regression in next with use printk_safe buffers in printk 2017-02-14 16:18 ` Peter Zijlstra @ 2017-02-14 16:56 ` Sergey Senozhatsky 2017-02-14 17:03 ` Tony Lindgren 2017-02-14 18:29 ` Peter Zijlstra 0 siblings, 2 replies; 16+ messages in thread From: Sergey Senozhatsky @ 2017-02-14 16:56 UTC (permalink / raw) To: Peter Zijlstra Cc: Sergey Senozhatsky, Tony Lindgren, Petr Mladek, Steven Rostedt, Thomas Gleixner, linux-kernel, Sergey Senozhatsky, Rafael J. Wysocki, linux-pm, John Stultz On (02/14/17 17:18), Peter Zijlstra wrote: > On Wed, Feb 15, 2017 at 01:01:40AM +0900, Sergey Senozhatsky wrote: > > > > but I'm a bit confused by rt_b->rt_runtime_lock in this unsafe lock > > scenario (so it's not ABBA, but ABAD) > > > > > lock(hrtimer_bases.lock); > > > lock(&rt_b->rt_runtime_lock); > > > lock(hrtimer_bases.lock); > > > lock(tk_core); > > > > > > > > > Chain exists of: > > > > > > tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock > > > > > > I'm lacking some knowledge here, sorry. where does the tk_core --> &rt_b->rt_runtime_lock > > come from? > > rt_b->rt_runtime_lock is one of the scheduler locks, since we do > printk() under tk_core, which does semaphore muck, which then includes > the entire scheduler chain of locks. thanks, Peter. that crossed my mind, but I kinda assumed that we do printk() from under tk_core using sched fair, and rt_runtime_lock is from sched rt. so something like below, perhaps. would be helpful if Tony can test it. (I'll send out this patch 'in a proper way' tomorrow, after some sleep, it's 2am here). 8< ==== >From e1755b0bf7f8a0be5fdf4dd7303bf4cd150d9d20 Mon Sep 17 00:00:00 2001 From: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Date: Wed, 15 Feb 2017 01:42:18 +0900 Subject: [PATCH] time/timekeeping_debug: use printk_deferred() Do not call printk() from tk_debug_account_sleep_time(), because tk_debug_account_sleep_time() is called under tk_core seq lock. It's not safe to call printk() under tk_core, because console_sem invokes scheduled (via wake_up_process()->activate_task()), which, in turn, can call timekeeping code again, for instance, via get_time()->ktime_get(). This may result in infinite loop on tk_core. Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> --- kernel/time/timekeeping_debug.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/kernel/time/timekeeping_debug.c b/kernel/time/timekeeping_debug.c index ca9fb800336b..b8f7146c3538 100644 --- a/kernel/time/timekeeping_debug.c +++ b/kernel/time/timekeeping_debug.c @@ -75,7 +75,8 @@ void tk_debug_account_sleep_time(struct timespec64 *t) int bin = min(fls(t->tv_sec), NUM_BINS-1); sleep_time_bin[bin]++; - pr_info("Suspended for %lld.%03lu seconds\n", (s64)t->tv_sec, + printk_deferred(KERN_INFO "Suspended for %lld.%03lu seconds\n", + (s64)t->tv_sec, t->tv_nsec / NSEC_PER_MSEC); } -- 2.11.1 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: Regression in next with use printk_safe buffers in printk 2017-02-14 16:56 ` Sergey Senozhatsky @ 2017-02-14 17:03 ` Tony Lindgren 2017-02-15 4:44 ` Sergey Senozhatsky 2017-02-14 18:29 ` Peter Zijlstra 1 sibling, 1 reply; 16+ messages in thread From: Tony Lindgren @ 2017-02-14 17:03 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Peter Zijlstra, Petr Mladek, Steven Rostedt, Thomas Gleixner, linux-kernel, Sergey Senozhatsky, Rafael J. Wysocki, linux-pm, John Stultz * Sergey Senozhatsky <sergey.senozhatsky@gmail.com> [170214 08:58]: > On (02/14/17 17:18), Peter Zijlstra wrote: > > On Wed, Feb 15, 2017 at 01:01:40AM +0900, Sergey Senozhatsky wrote: > > > > > > but I'm a bit confused by rt_b->rt_runtime_lock in this unsafe lock > > > scenario (so it's not ABBA, but ABAD) > > > > > > > lock(hrtimer_bases.lock); > > > > lock(&rt_b->rt_runtime_lock); > > > > lock(hrtimer_bases.lock); > > > > lock(tk_core); > > > > > > > > > > > > Chain exists of: > > > > > > > > tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock > > > > > > > > > I'm lacking some knowledge here, sorry. where does the tk_core --> &rt_b->rt_runtime_lock > > > come from? > > > > rt_b->rt_runtime_lock is one of the scheduler locks, since we do > > printk() under tk_core, which does semaphore muck, which then includes > > the entire scheduler chain of locks. > > thanks, Peter. > > that crossed my mind, but I kinda assumed that we do printk() from > under tk_core using sched fair, and rt_runtime_lock is from sched rt. > > > so something like below, perhaps. would be helpful if Tony can test it. > > (I'll send out this patch 'in a proper way' tomorrow, after some sleep, > it's 2am here). > > 8< ==== > > From e1755b0bf7f8a0be5fdf4dd7303bf4cd150d9d20 Mon Sep 17 00:00:00 2001 > From: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> > Date: Wed, 15 Feb 2017 01:42:18 +0900 > Subject: [PATCH] time/timekeeping_debug: use printk_deferred() > > Do not call printk() from tk_debug_account_sleep_time(), because > tk_debug_account_sleep_time() is called under tk_core seq lock. > It's not safe to call printk() under tk_core, because console_sem > invokes scheduled (via wake_up_process()->activate_task()), which, > in turn, can call timekeeping code again, for instance, via > get_time()->ktime_get(). This may result in infinite loop on > tk_core. > > Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Thanks yeah this fixes the issue for me: Tested-by: Tony Lindgren <tony@atomide.com> > --- > kernel/time/timekeeping_debug.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/kernel/time/timekeeping_debug.c b/kernel/time/timekeeping_debug.c > index ca9fb800336b..b8f7146c3538 100644 > --- a/kernel/time/timekeeping_debug.c > +++ b/kernel/time/timekeeping_debug.c > @@ -75,7 +75,8 @@ void tk_debug_account_sleep_time(struct timespec64 *t) > int bin = min(fls(t->tv_sec), NUM_BINS-1); > > sleep_time_bin[bin]++; > - pr_info("Suspended for %lld.%03lu seconds\n", (s64)t->tv_sec, > + printk_deferred(KERN_INFO "Suspended for %lld.%03lu seconds\n", > + (s64)t->tv_sec, > t->tv_nsec / NSEC_PER_MSEC); > } > > -- > 2.11.1 > ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Regression in next with use printk_safe buffers in printk 2017-02-14 17:03 ` Tony Lindgren @ 2017-02-15 4:44 ` Sergey Senozhatsky 0 siblings, 0 replies; 16+ messages in thread From: Sergey Senozhatsky @ 2017-02-15 4:44 UTC (permalink / raw) To: Tony Lindgren Cc: Sergey Senozhatsky, Peter Zijlstra, Petr Mladek, Steven Rostedt, Thomas Gleixner, linux-kernel, Sergey Senozhatsky, Rafael J. Wysocki, linux-pm, John Stultz On (02/14/17 09:03), Tony Lindgren wrote: [..] > > Do not call printk() from tk_debug_account_sleep_time(), because > > tk_debug_account_sleep_time() is called under tk_core seq lock. > > It's not safe to call printk() under tk_core, because console_sem > > invokes scheduled (via wake_up_process()->activate_task()), which, > > in turn, can call timekeeping code again, for instance, via > > get_time()->ktime_get(). This may result in infinite loop on > > tk_core. > > > > Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> > > Thanks yeah this fixes the issue for me: > > Tested-by: Tony Lindgren <tony@atomide.com> thanks. -ss ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Regression in next with use printk_safe buffers in printk 2017-02-14 16:56 ` Sergey Senozhatsky 2017-02-14 17:03 ` Tony Lindgren @ 2017-02-14 18:29 ` Peter Zijlstra 2017-02-15 4:49 ` Sergey Senozhatsky 1 sibling, 1 reply; 16+ messages in thread From: Peter Zijlstra @ 2017-02-14 18:29 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Tony Lindgren, Petr Mladek, Steven Rostedt, Thomas Gleixner, linux-kernel, Sergey Senozhatsky, Rafael J. Wysocki, linux-pm, John Stultz On Wed, Feb 15, 2017 at 01:56:45AM +0900, Sergey Senozhatsky wrote: > that crossed my mind, but I kinda assumed that we do printk() from > under tk_core using sched fair, and rt_runtime_lock is from sched rt. That's all true; lockdep doesn't care :-) All it knows is that at some point those locks nest. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Regression in next with use printk_safe buffers in printk 2017-02-14 18:29 ` Peter Zijlstra @ 2017-02-15 4:49 ` Sergey Senozhatsky 0 siblings, 0 replies; 16+ messages in thread From: Sergey Senozhatsky @ 2017-02-15 4:49 UTC (permalink / raw) To: Peter Zijlstra Cc: Sergey Senozhatsky, Tony Lindgren, Petr Mladek, Steven Rostedt, Thomas Gleixner, linux-kernel, Sergey Senozhatsky, Rafael J. Wysocki, linux-pm, John Stultz On (02/14/17 19:29), Peter Zijlstra wrote: > On Wed, Feb 15, 2017 at 01:56:45AM +0900, Sergey Senozhatsky wrote: > > that crossed my mind, but I kinda assumed that we do printk() from > > under tk_core using sched fair, and rt_runtime_lock is from sched rt. > > That's all true; lockdep doesn't care :-) All it knows is that at some > point those locks nest. thanks. I think I'll get more familiar with the lockdep splats in coming months :) but it's good (well, so far) that now we keep lockdep enabled in printk. -ss ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Regression in next with use printk_safe buffers in printk 2017-02-14 16:01 ` Sergey Senozhatsky 2017-02-14 16:18 ` Peter Zijlstra @ 2017-02-14 16:54 ` Tony Lindgren 2017-02-15 18:01 ` Tony Lindgren 2 siblings, 0 replies; 16+ messages in thread From: Tony Lindgren @ 2017-02-14 16:54 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Petr Mladek, Steven Rostedt, Peter Zijlstra, Thomas Gleixner, linux-kernel, Sergey Senozhatsky, Rafael J. Wysocki, linux-pm * Sergey Senozhatsky <sergey.senozhatsky@gmail.com> [170214 08:03]: > Hello, > > Cc Rafael, just in case > > On (02/13/17 10:59), Tony Lindgren wrote: > > Looks like commit f975237b7682 ("printk: use printk_safe buffers in > > printk") causes "possible circular locking dependency detected " for > > me on the first suspend. > > thanks for the report. > > > Reverting the following four patches in next makes it go away: > > > > d9c23523ed98 ("printk: drop call_console_drivers() unused param") > > de6fcbdb68b2 ("printk: convert the rest to printk-safe") > > 8b1742c9c207 ("printk: remove zap_locks() function") > > f975237b7682 ("printk: use printk_safe buffers in printk") > > > these patches basically just enable locked where it previously was > forcibly turned off. no timekeeping/pm/sched/etc code was modified. > can you share the link where Peter pointed out that this might be > caused by printk() changes? Oh sorry I should have been more specific. Not much there to share, I got redirected over to Peter's department in a private email thread while chasing this issue. So that was just Peter's comment looking at the log output. > timekeeping_resume() > lock timekeeper_lock > lock tk_core > tk_debug_account_sleep_time() > printk() << lockdep was disabled here before > try_to_wake_up() > lock_hrtimer_base() ##hrtimer_bases.lock Yeah above seems describe what changed, so your patch makes sense. > shouldn't tk_debug_account_sleep_time() do printk_deferred() instead of > 'normal' printk()? > printk() calls from under timekeeping seqlock are not safe, aren't they? > and tk_debug_account_sleep_time() is under tk_core seq lock. > > IOW, replace pr_info() in tk_debug_account_sleep_time() with something > like this > > printk_deferred(KERN_INFO "Suspended for %lld.%03lu seconds\n", > (s64)t->tv_sec, t->tv_nsec / NSEC_PER_MSEC); Your patch below fixes the issue for me thanks. I had to apply it manually though as tabs got replaced by spaces probably by some mail daemons. Regards, Tony > --- > > diff --git a/kernel/time/timekeeping_debug.c b/kernel/time/timekeeping_debug.c > index ca9fb800336b..b8f7146c3538 100644 > --- a/kernel/time/timekeeping_debug.c > +++ b/kernel/time/timekeeping_debug.c > @@ -75,7 +75,8 @@ void tk_debug_account_sleep_time(struct timespec64 *t) > int bin = min(fls(t->tv_sec), NUM_BINS-1); > > sleep_time_bin[bin]++; > - pr_info("Suspended for %lld.%03lu seconds\n", (s64)t->tv_sec, > + printk_deferred(KERN_INFO "Suspended for %lld.%03lu seconds\n", > + (s64)t->tv_sec, > t->tv_nsec / NSEC_PER_MSEC); > } > > ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Regression in next with use printk_safe buffers in printk 2017-02-14 16:01 ` Sergey Senozhatsky 2017-02-14 16:18 ` Peter Zijlstra 2017-02-14 16:54 ` Tony Lindgren @ 2017-02-15 18:01 ` Tony Lindgren 2017-02-16 1:31 ` Sergey Senozhatsky 2 siblings, 1 reply; 16+ messages in thread From: Tony Lindgren @ 2017-02-15 18:01 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Petr Mladek, Steven Rostedt, Peter Zijlstra, Thomas Gleixner, linux-kernel, Sergey Senozhatsky, Rafael J. Wysocki, linux-pm, Russell King Hi, * Sergey Senozhatsky <sergey.senozhatsky@gmail.com> [170214 08:03]: > Hello, > > Cc Rafael, just in case Found another issue on booting ARM devices, so adding Russell too. > On (02/13/17 10:59), Tony Lindgren wrote: > > Looks like commit f975237b7682 ("printk: use printk_safe buffers in > > printk") causes "possible circular locking dependency detected " for > > me on the first suspend. > > thanks for the report. > > > Reverting the following four patches in next makes it go away: > > > > d9c23523ed98 ("printk: drop call_console_drivers() unused param") > > de6fcbdb68b2 ("printk: convert the rest to printk-safe") > > 8b1742c9c207 ("printk: remove zap_locks() function") > > f975237b7682 ("printk: use printk_safe buffers in printk") > > > these patches basically just enable locked where it previously was > forcibly turned off. no timekeeping/pm/sched/etc code was modified. Below is another issue I noticed caused by commit f975237b7682 that I noticed during booting. Regards, Tony 8< -------------------------- [ 2.581939] hw-breakpoint: Failed to enable monitor mode on CPU 0. [ 2.591613] hw-breakpoint: CPU 0 failed to disable vector catch [ 2.597686] [ 2.597717] =============================== [ 2.597717] [ ERR: suspicious RCU usage. ] [ 2.597717] 4.10.0-rc8-next-20170215+ #119 Not tainted [ 2.597717] ------------------------------- [ 2.597717] ./include/trace/events/printk.h:32 suspicious rcu_dereference_check() usage! [ 2.597717] [ 2.597717] other info that might help us debug this: [ 2.597717] [ 2.597717] [ 2.597717] RCU used illegally from idle CPU! [ 2.597717] rcu_scheduler_active = 2, debug_locks = 0 [ 2.597747] RCU used illegally from extended quiescent state! [ 2.597747] 2 locks held by swapper/0/0: [ 2.597747] #0: (cpu_pm_notifier_lock){......}, at: [<c0237e2c>] cpu_pm_exit+0x10/0x54 [ 2.597747] #1: (console_lock){+.+.+.}, at: [<c01ab350>] vprintk_emit+0x264/0x474 [ 2.597747] [ 2.597747] stack backtrace: [ 2.597778] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.10.0-rc8-next-20170215+ #119 [ 2.597778] Hardware name: Generic OMAP4 (Flattened Device Tree) [ 2.597778] [<c0110228>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14) [ 2.597778] [<c010c224>] (show_stack) from [<c04ca8a0>] (dump_stack+0xac/0xe0) [ 2.597778] [<c04ca8a0>] (dump_stack) from [<c01ab084>] (console_unlock+0x5e8/0x650) [ 2.597778] [<c01ab084>] (console_unlock) from [<c01ab35c>] (vprintk_emit+0x270/0x474) [ 2.597778] [<c01ab35c>] (vprintk_emit) from [<c01ab6f0>] (vprintk_default+0x20/0x28) [ 2.597808] [<c01ab6f0>] (vprintk_default) from [<c0250c94>] (printk+0x20/0x30) [ 2.597808] [<c0250c94>] (printk) from [<c0111004>] (reset_ctrl_regs+0x108/0x288) [ 2.597808] [<c0111004>] (reset_ctrl_regs) from [<c0111220>] (dbg_cpu_pm_notify+0x28/0x30) [ 2.597808] [<c0111220>] (dbg_cpu_pm_notify) from [<c015f3ac>] (notifier_call_chain+0x44/0x80) [ 2.597808] [<c015f3ac>] (notifier_call_chain) from [<c0237e48>] (cpu_pm_exit+0x2c/0x54) [ 2.597808] [<c0237e48>] (cpu_pm_exit) from [<c0126ca0>] (omap_enter_idle_coupled+0x80/0x208) [ 2.597808] [<c0126ca0>] (omap_enter_idle_coupled) from [<c0680d80>] (cpuidle_enter_state+0x118/0x4ac) [ 2.597808] [<c0680d80>] (cpuidle_enter_state) from [<c0682e54>] (cpuidle_enter_state_coupled+0x3a8/0x40c) [ 2.597839] [<c0682e54>] (cpuidle_enter_state_coupled) from [<c0190c04>] (do_idle+0x1a4/0x218) [ 2.597839] [<c0190c04>] (do_idle) from [<c0190ffc>] (cpu_startup_entry+0x18/0x1c) [ 2.597839] [<c0190ffc>] (cpu_startup_entry) from [<c0c00c40>] (start_kernel+0x35c/0x3d4) [ 2.597839] [<c0c00c40>] (start_kernel) from [<8000807c>] (0x8000807c) ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Regression in next with use printk_safe buffers in printk 2017-02-15 18:01 ` Tony Lindgren @ 2017-02-16 1:31 ` Sergey Senozhatsky 2017-02-16 4:03 ` Tony Lindgren 0 siblings, 1 reply; 16+ messages in thread From: Sergey Senozhatsky @ 2017-02-16 1:31 UTC (permalink / raw) To: Tony Lindgren Cc: Sergey Senozhatsky, Petr Mladek, Steven Rostedt, Peter Zijlstra, Thomas Gleixner, linux-kernel, Sergey Senozhatsky, Rafael J. Wysocki, linux-pm, Russell King On (02/15/17 10:01), Tony Lindgren wrote: [..] > Below is another issue I noticed caused by commit f975237b7682 that > I noticed during booting. do you mean that with f975237b7682 you _always_ see that illegal RCU usage warning? > 8< -------------------------- > [ 2.581939] hw-breakpoint: Failed to enable monitor mode on CPU 0. > [ 2.591613] hw-breakpoint: CPU 0 failed to disable vector catch > [ 2.597686] > [ 2.597717] =============================== > [ 2.597717] [ ERR: suspicious RCU usage. ] > [ 2.597717] 4.10.0-rc8-next-20170215+ #119 Not tainted > [ 2.597717] ------------------------------- > [ 2.597717] ./include/trace/events/printk.h:32 suspicious rcu_dereference_check() usage! so this is trace_console(text, len) from cpuidle_enter_state() ... vprintk_emit() << error: Failed to enable monitor mode/CPU 0 failed to disable... console_unlock() call_console_drivers() trace_console() > [ 2.597717] > [ 2.597717] other info that might help us debug this: > [ 2.597717] > [ 2.597717] > [ 2.597717] RCU used illegally from idle CPU! um... is this what rcuidle() for? --- diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index 7180088cbb23..34da86e73d00 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -1563,7 +1563,7 @@ static void call_console_drivers(const char *ext_text, size_t ext_len, { struct console *con; - trace_console(text, len); + trace_console_rcuidle(text, len); if (!console_drivers) return; --- I need to look more at this. -ss > [ 2.597717] rcu_scheduler_active = 2, debug_locks = 0 > [ 2.597747] RCU used illegally from extended quiescent state! > [ 2.597747] 2 locks held by swapper/0/0: > [ 2.597747] #0: (cpu_pm_notifier_lock){......}, at: [<c0237e2c>] cpu_pm_exit+0x10/0x54 > [ 2.597747] #1: (console_lock){+.+.+.}, at: [<c01ab350>] vprintk_emit+0x264/0x474 > [ 2.597747] > [ 2.597747] stack backtrace: > [ 2.597778] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.10.0-rc8-next-20170215+ #119 > [ 2.597778] Hardware name: Generic OMAP4 (Flattened Device Tree) > [ 2.597778] [<c0110228>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14) > [ 2.597778] [<c010c224>] (show_stack) from [<c04ca8a0>] (dump_stack+0xac/0xe0) > [ 2.597778] [<c04ca8a0>] (dump_stack) from [<c01ab084>] (console_unlock+0x5e8/0x650) > [ 2.597778] [<c01ab084>] (console_unlock) from [<c01ab35c>] (vprintk_emit+0x270/0x474) > [ 2.597778] [<c01ab35c>] (vprintk_emit) from [<c01ab6f0>] (vprintk_default+0x20/0x28) > [ 2.597808] [<c01ab6f0>] (vprintk_default) from [<c0250c94>] (printk+0x20/0x30) > [ 2.597808] [<c0250c94>] (printk) from [<c0111004>] (reset_ctrl_regs+0x108/0x288) > [ 2.597808] [<c0111004>] (reset_ctrl_regs) from [<c0111220>] (dbg_cpu_pm_notify+0x28/0x30) > [ 2.597808] [<c0111220>] (dbg_cpu_pm_notify) from [<c015f3ac>] (notifier_call_chain+0x44/0x80) > [ 2.597808] [<c015f3ac>] (notifier_call_chain) from [<c0237e48>] (cpu_pm_exit+0x2c/0x54) > [ 2.597808] [<c0237e48>] (cpu_pm_exit) from [<c0126ca0>] (omap_enter_idle_coupled+0x80/0x208) > [ 2.597808] [<c0126ca0>] (omap_enter_idle_coupled) from [<c0680d80>] (cpuidle_enter_state+0x118/0x4ac) > [ 2.597808] [<c0680d80>] (cpuidle_enter_state) from [<c0682e54>] (cpuidle_enter_state_coupled+0x3a8/0x40c) > [ 2.597839] [<c0682e54>] (cpuidle_enter_state_coupled) from [<c0190c04>] (do_idle+0x1a4/0x218) > [ 2.597839] [<c0190c04>] (do_idle) from [<c0190ffc>] (cpu_startup_entry+0x18/0x1c) > [ 2.597839] [<c0190ffc>] (cpu_startup_entry) from [<c0c00c40>] (start_kernel+0x35c/0x3d4) > [ 2.597839] [<c0c00c40>] (start_kernel) from [<8000807c>] (0x8000807c) ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: Regression in next with use printk_safe buffers in printk 2017-02-16 1:31 ` Sergey Senozhatsky @ 2017-02-16 4:03 ` Tony Lindgren 2017-02-16 4:25 ` Sergey Senozhatsky 0 siblings, 1 reply; 16+ messages in thread From: Tony Lindgren @ 2017-02-16 4:03 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Sergey Senozhatsky, Petr Mladek, Steven Rostedt, Peter Zijlstra, Thomas Gleixner, linux-kernel, Rafael J. Wysocki, linux-pm, Russell King * Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> [170215 17:32]: > On (02/15/17 10:01), Tony Lindgren wrote: > [..] > > Below is another issue I noticed caused by commit f975237b7682 that > > I noticed during booting. > > do you mean that with f975237b7682 you _always_ see that illegal RCU > usage warning? Yeah on every boot on devices using cpuidle_coupled. > > 8< -------------------------- > > [ 2.581939] hw-breakpoint: Failed to enable monitor mode on CPU 0. > > [ 2.591613] hw-breakpoint: CPU 0 failed to disable vector catch > > [ 2.597686] > > [ 2.597717] =============================== > > [ 2.597717] [ ERR: suspicious RCU usage. ] > > [ 2.597717] 4.10.0-rc8-next-20170215+ #119 Not tainted > > [ 2.597717] ------------------------------- > > [ 2.597717] ./include/trace/events/printk.h:32 suspicious rcu_dereference_check() usage! > > so this is trace_console(text, len) from > > cpuidle_enter_state() > ... > vprintk_emit() << error: Failed to enable monitor mode/CPU 0 failed to disable... > console_unlock() > call_console_drivers() > trace_console() > > > > [ 2.597717] > > [ 2.597717] other info that might help us debug this: > > [ 2.597717] > > [ 2.597717] > > [ 2.597717] RCU used illegally from idle CPU! > > um... is this what rcuidle() for? Yeah I think that's all there is to it. We've had a bunch of similar issues pop up over past year :) See for example: $ git log --author=McKenney drivers/base/power > --- > > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c > index 7180088cbb23..34da86e73d00 100644 > --- a/kernel/printk/printk.c > +++ b/kernel/printk/printk.c > @@ -1563,7 +1563,7 @@ static void call_console_drivers(const char *ext_text, size_t ext_len, > { > struct console *con; > > - trace_console(text, len); > + trace_console_rcuidle(text, len); > > if (!console_drivers) > return; > > --- > > > I need to look more at this. That fixes it for me thanks, so feel free to add: Tested-by: Tony Lindgren <tony@atomide.com> > > [ 2.597717] rcu_scheduler_active = 2, debug_locks = 0 > > [ 2.597747] RCU used illegally from extended quiescent state! > > > > [ 2.597747] 2 locks held by swapper/0/0: > > [ 2.597747] #0: (cpu_pm_notifier_lock){......}, at: [<c0237e2c>] cpu_pm_exit+0x10/0x54 > > [ 2.597747] #1: (console_lock){+.+.+.}, at: [<c01ab350>] vprintk_emit+0x264/0x474 > > [ 2.597747] > > [ 2.597747] stack backtrace: > > [ 2.597778] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.10.0-rc8-next-20170215+ #119 > > [ 2.597778] Hardware name: Generic OMAP4 (Flattened Device Tree) > > [ 2.597778] [<c0110228>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14) > > [ 2.597778] [<c010c224>] (show_stack) from [<c04ca8a0>] (dump_stack+0xac/0xe0) > > [ 2.597778] [<c04ca8a0>] (dump_stack) from [<c01ab084>] (console_unlock+0x5e8/0x650) > > [ 2.597778] [<c01ab084>] (console_unlock) from [<c01ab35c>] (vprintk_emit+0x270/0x474) > > [ 2.597778] [<c01ab35c>] (vprintk_emit) from [<c01ab6f0>] (vprintk_default+0x20/0x28) > > [ 2.597808] [<c01ab6f0>] (vprintk_default) from [<c0250c94>] (printk+0x20/0x30) > > [ 2.597808] [<c0250c94>] (printk) from [<c0111004>] (reset_ctrl_regs+0x108/0x288) > > [ 2.597808] [<c0111004>] (reset_ctrl_regs) from [<c0111220>] (dbg_cpu_pm_notify+0x28/0x30) > > [ 2.597808] [<c0111220>] (dbg_cpu_pm_notify) from [<c015f3ac>] (notifier_call_chain+0x44/0x80) > > [ 2.597808] [<c015f3ac>] (notifier_call_chain) from [<c0237e48>] (cpu_pm_exit+0x2c/0x54) > > [ 2.597808] [<c0237e48>] (cpu_pm_exit) from [<c0126ca0>] (omap_enter_idle_coupled+0x80/0x208) > > [ 2.597808] [<c0126ca0>] (omap_enter_idle_coupled) from [<c0680d80>] (cpuidle_enter_state+0x118/0x4ac) > > [ 2.597808] [<c0680d80>] (cpuidle_enter_state) from [<c0682e54>] (cpuidle_enter_state_coupled+0x3a8/0x40c) > > [ 2.597839] [<c0682e54>] (cpuidle_enter_state_coupled) from [<c0190c04>] (do_idle+0x1a4/0x218) > > [ 2.597839] [<c0190c04>] (do_idle) from [<c0190ffc>] (cpu_startup_entry+0x18/0x1c) > > [ 2.597839] [<c0190ffc>] (cpu_startup_entry) from [<c0c00c40>] (start_kernel+0x35c/0x3d4) > > [ 2.597839] [<c0c00c40>] (start_kernel) from [<8000807c>] (0x8000807c) ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Regression in next with use printk_safe buffers in printk 2017-02-16 4:03 ` Tony Lindgren @ 2017-02-16 4:25 ` Sergey Senozhatsky 2017-02-16 15:10 ` Tony Lindgren 0 siblings, 1 reply; 16+ messages in thread From: Sergey Senozhatsky @ 2017-02-16 4:25 UTC (permalink / raw) To: Tony Lindgren Cc: Sergey Senozhatsky, Sergey Senozhatsky, Petr Mladek, Steven Rostedt, Peter Zijlstra, Thomas Gleixner, linux-kernel, Rafael J. Wysocki, linux-pm, Russell King On (02/15/17 20:03), Tony Lindgren wrote: > * Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> [170215 17:32]: > > On (02/15/17 10:01), Tony Lindgren wrote: > > [..] > > > Below is another issue I noticed caused by commit f975237b7682 that > > > I noticed during booting. > > > > do you mean that with f975237b7682 you _always_ see that illegal RCU > > usage warning? > > Yeah on every boot on devices using cpuidle_coupled. does this mean that with the printk-safe patches reverted (so, basically, the same conditions module 4 printk patches) you don't see illegal RCU usage reports? at the moment I can't see any connection between f975237b7682 and RCU usage from idle CPU. [..] > > --- > > > > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c > > index 7180088cbb23..34da86e73d00 100644 > > --- a/kernel/printk/printk.c > > +++ b/kernel/printk/printk.c > > @@ -1563,7 +1563,7 @@ static void call_console_drivers(const char *ext_text, size_t ext_len, > > { > > struct console *con; > > > > - trace_console(text, len); > > + trace_console_rcuidle(text, len); > > > > if (!console_drivers) > > return; > > > > --- > > > > > > I need to look more at this. > > That fixes it for me thanks, so feel free to add: > > Tested-by: Tony Lindgren <tony@atomide.com> thanks. -ss ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Regression in next with use printk_safe buffers in printk 2017-02-16 4:25 ` Sergey Senozhatsky @ 2017-02-16 15:10 ` Tony Lindgren 2017-02-16 16:31 ` Sergey Senozhatsky 0 siblings, 1 reply; 16+ messages in thread From: Tony Lindgren @ 2017-02-16 15:10 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Sergey Senozhatsky, Petr Mladek, Steven Rostedt, Peter Zijlstra, Thomas Gleixner, linux-kernel, Rafael J. Wysocki, linux-pm, Russell King * Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> [170215 20:26]: > On (02/15/17 20:03), Tony Lindgren wrote: > > * Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> [170215 17:32]: > > > On (02/15/17 10:01), Tony Lindgren wrote: > > > [..] > > > > Below is another issue I noticed caused by commit f975237b7682 that > > > > I noticed during booting. > > > > > > do you mean that with f975237b7682 you _always_ see that illegal RCU > > > usage warning? > > > > Yeah on every boot on devices using cpuidle_coupled. > > does this mean that with the printk-safe patches reverted > (so, basically, the same conditions module 4 printk patches) > you don't see illegal RCU usage reports? at the moment I can't > see any connection between f975237b7682 and RCU usage from idle CPU. Yes reverting those four patches I listed earlier also makes it go away. Regards, Tony ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Regression in next with use printk_safe buffers in printk 2017-02-16 15:10 ` Tony Lindgren @ 2017-02-16 16:31 ` Sergey Senozhatsky 2017-02-16 19:13 ` Tony Lindgren 0 siblings, 1 reply; 16+ messages in thread From: Sergey Senozhatsky @ 2017-02-16 16:31 UTC (permalink / raw) To: Tony Lindgren Cc: Sergey Senozhatsky, Sergey Senozhatsky, Petr Mladek, Steven Rostedt, Peter Zijlstra, Thomas Gleixner, linux-kernel, Rafael J. Wysocki, linux-pm, Russell King On (02/16/17 07:10), Tony Lindgren wrote: [..] > > > > [..] > > > > > Below is another issue I noticed caused by commit f975237b7682 that > > > > > I noticed during booting. > > > > > > > > do you mean that with f975237b7682 you _always_ see that illegal RCU > > > > usage warning? > > > > > > Yeah on every boot on devices using cpuidle_coupled. > > > > does this mean that with the printk-safe patches reverted > > (so, basically, the same conditions module 4 printk patches) > > you don't see illegal RCU usage reports? at the moment I can't > > see any connection between f975237b7682 and RCU usage from idle CPU. > > Yes reverting those four patches I listed earlier also makes it go > away. aha... so, the previous RCU warning was simply suppressed by lockdep_off() that we used to have in printk(). RCU dereference check #define __rcu_dereference_check(p, c, space) \ ({ \ /* Dependency order vs. p above. */ \ typeof(*p) *________p1 = (typeof(*p) *__force)lockless_dereference(p); \ RCU_LOCKDEP_WARN(!(c), "suspicious rcu_dereference_check() usage"); \ rcu_dereference_sparse(p, space); \ ((typeof(*p) __force __kernel *)(________p1)); \ }) where RCU_LOCKDEP_WARN() that prints "suspicious rcu_dereference_check() usage" is #define RCU_LOCKDEP_WARN(c, s) \ do { \ static bool __section(.data.unlikely) __warned; \ if (debug_lockdep_rcu_enabled() && !__warned && (c)) { \ __warned = true; \ lockdep_rcu_suspicious(__FILE__, __LINE__, s); \ } \ } while (0) where debug_lockdep_rcu_enabled() int notrace debug_lockdep_rcu_enabled(void) { return rcu_scheduler_active != RCU_SCHEDULER_INACTIVE && debug_locks && current->lockdep_recursion == 0; } depends on lockdep state. and we just used to have 'current->lockdep_recursion != 0' here, because of lockdep_off() in printk() around console_unlock(), which increments ->lockdep_recursion. now we have lockdep enabled and the ->lockdep_recursion == 0. so the RCU warning is valid and I need to Cc stable on that _rcuidle patch, the tracepoint is pretty old. it's from 3.4 -ss ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Regression in next with use printk_safe buffers in printk 2017-02-16 16:31 ` Sergey Senozhatsky @ 2017-02-16 19:13 ` Tony Lindgren 0 siblings, 0 replies; 16+ messages in thread From: Tony Lindgren @ 2017-02-16 19:13 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Sergey Senozhatsky, Petr Mladek, Steven Rostedt, Peter Zijlstra, Thomas Gleixner, linux-kernel, Rafael J. Wysocki, linux-pm, Russell King * Sergey Senozhatsky <sergey.senozhatsky@gmail.com> [170216 08:33]: > On (02/16/17 07:10), Tony Lindgren wrote: > [..] > > > > > [..] > > > > > > Below is another issue I noticed caused by commit f975237b7682 that > > > > > > I noticed during booting. > > > > > > > > > > do you mean that with f975237b7682 you _always_ see that illegal RCU > > > > > usage warning? > > > > > > > > Yeah on every boot on devices using cpuidle_coupled. > > > > > > does this mean that with the printk-safe patches reverted > > > (so, basically, the same conditions module 4 printk patches) > > > you don't see illegal RCU usage reports? at the moment I can't > > > see any connection between f975237b7682 and RCU usage from idle CPU. > > > > Yes reverting those four patches I listed earlier also makes it go > > away. > > aha... so, the previous RCU warning was simply suppressed by lockdep_off() > that we used to have in printk(). > > > RCU dereference check > > #define __rcu_dereference_check(p, c, space) \ > ({ \ > /* Dependency order vs. p above. */ \ > typeof(*p) *________p1 = (typeof(*p) *__force)lockless_dereference(p); \ > RCU_LOCKDEP_WARN(!(c), "suspicious rcu_dereference_check() usage"); \ > rcu_dereference_sparse(p, space); \ > ((typeof(*p) __force __kernel *)(________p1)); \ > }) > > > where RCU_LOCKDEP_WARN() that prints "suspicious rcu_dereference_check() usage" > is > > > #define RCU_LOCKDEP_WARN(c, s) \ > do { \ > static bool __section(.data.unlikely) __warned; \ > if (debug_lockdep_rcu_enabled() && !__warned && (c)) { \ > __warned = true; \ > lockdep_rcu_suspicious(__FILE__, __LINE__, s); \ > } \ > } while (0) > > > > where debug_lockdep_rcu_enabled() > > int notrace debug_lockdep_rcu_enabled(void) > { > return rcu_scheduler_active != RCU_SCHEDULER_INACTIVE && debug_locks && > current->lockdep_recursion == 0; > } > > depends on lockdep state. and we just used to have > 'current->lockdep_recursion != 0' here, because of lockdep_off() > in printk() around console_unlock(), which increments ->lockdep_recursion. > > now we have lockdep enabled and the ->lockdep_recursion == 0. > > > so the RCU warning is valid and I need to Cc stable on that _rcuidle > patch, the tracepoint is pretty old. it's from 3.4 OK thanks for checking why it changed. Regards, Tony ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2017-02-16 19:13 UTC | newest] Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-02-13 18:59 Regression in next with use printk_safe buffers in printk Tony Lindgren 2017-02-14 16:01 ` Sergey Senozhatsky 2017-02-14 16:18 ` Peter Zijlstra 2017-02-14 16:56 ` Sergey Senozhatsky 2017-02-14 17:03 ` Tony Lindgren 2017-02-15 4:44 ` Sergey Senozhatsky 2017-02-14 18:29 ` Peter Zijlstra 2017-02-15 4:49 ` Sergey Senozhatsky 2017-02-14 16:54 ` Tony Lindgren 2017-02-15 18:01 ` Tony Lindgren 2017-02-16 1:31 ` Sergey Senozhatsky 2017-02-16 4:03 ` Tony Lindgren 2017-02-16 4:25 ` Sergey Senozhatsky 2017-02-16 15:10 ` Tony Lindgren 2017-02-16 16:31 ` Sergey Senozhatsky 2017-02-16 19:13 ` Tony Lindgren
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.