Default enable RCU list lockdep debugging with PROVE_RCU
diff mbox series

Message ID 20200228092451.10455-1-madhuparnabhowmik10@gmail.com
State In Next
Commit c9af03c14bfdfd21515e556c3a90ffe2aadc964d
Headers show
Series
  • Default enable RCU list lockdep debugging with PROVE_RCU
Related show

Commit Message

Madhuparna Bhowmik Feb. 28, 2020, 9:24 a.m. UTC
From: Madhuparna Bhowmik <madhuparnabhowmik10@gmail.com>

This patch default enables CONFIG_PROVE_RCU_LIST option with
CONFIG_PROVE_RCU for RCU list lockdep debugging.

With this change, RCU list lockdep debugging will be default
enabled in CONFIG_PROVE_RCU=y kernels.

Most of the RCU users (in core kernel/, drivers/, and net/
subsystem) have already been modified to include lockdep
expressions hence RCU list debugging can be enabled by
default.

However, there are still chances of enountering
false-positive lockdep splats because not everything is converted,
in case RCU list primitives are used in non-RCU read-side critical
section but under the protection of a lock. It would be okay to
have a few false-positives, as long as bugs are identified, since this
patch only affects debugging kernels.

Co-developed-by: Amol Grover <frextrite@gmail.com>
Signed-off-by: Amol Grover <frextrite@gmail.com>
Signed-off-by: Madhuparna Bhowmik <madhuparnabhowmik10@gmail.com>
---
 kernel/rcu/Kconfig.debug | 11 +++--------
 1 file changed, 3 insertions(+), 8 deletions(-)

Comments

Joel Fernandes Feb. 28, 2020, 2:21 p.m. UTC | #1
On Fri, Feb 28, 2020 at 02:54:51PM +0530, madhuparnabhowmik10@gmail.com wrote:
> From: Madhuparna Bhowmik <madhuparnabhowmik10@gmail.com>
> 
> This patch default enables CONFIG_PROVE_RCU_LIST option with
> CONFIG_PROVE_RCU for RCU list lockdep debugging.
> 
> With this change, RCU list lockdep debugging will be default
> enabled in CONFIG_PROVE_RCU=y kernels.
> 
> Most of the RCU users (in core kernel/, drivers/, and net/
> subsystem) have already been modified to include lockdep
> expressions hence RCU list debugging can be enabled by
> default.
> 
> However, there are still chances of enountering
> false-positive lockdep splats because not everything is converted,
> in case RCU list primitives are used in non-RCU read-side critical
> section but under the protection of a lock. It would be okay to
> have a few false-positives, as long as bugs are identified, since this
> patch only affects debugging kernels.
> 
> Co-developed-by: Amol Grover <frextrite@gmail.com>
> Signed-off-by: Amol Grover <frextrite@gmail.com>
> Signed-off-by: Madhuparna Bhowmik <madhuparnabhowmik10@gmail.com>

Acked-by: Joel Fernandes (Google) <joel@joelfernandes.org>

thanks,

 - Joel

> ---
>  kernel/rcu/Kconfig.debug | 11 +++--------
>  1 file changed, 3 insertions(+), 8 deletions(-)
> 
> diff --git a/kernel/rcu/Kconfig.debug b/kernel/rcu/Kconfig.debug
> index 4aa02eee8f6c..ec4bb6c09048 100644
> --- a/kernel/rcu/Kconfig.debug
> +++ b/kernel/rcu/Kconfig.debug
> @@ -9,15 +9,10 @@ config PROVE_RCU
>  	def_bool PROVE_LOCKING
>  
>  config PROVE_RCU_LIST
> -	bool "RCU list lockdep debugging"
> -	depends on PROVE_RCU && RCU_EXPERT
> -	default n
> +	def_bool PROVE_RCU
>  	help
> -	  Enable RCU lockdep checking for list usages. By default it is
> -	  turned off since there are several list RCU users that still
> -	  need to be converted to pass a lockdep expression. To prevent
> -	  false-positive splats, we keep it default disabled but once all
> -	  users are converted, we can remove this config option.
> +	  Enable RCU lockdep checking for list usages. It is default
> +	  enabled with CONFIG_PROVE_RCU.
>  
>  config TORTURE_TEST
>  	tristate
> -- 
> 2.17.1
>
Paul E. McKenney Feb. 28, 2020, 2:37 p.m. UTC | #2
On Fri, Feb 28, 2020 at 09:21:22AM -0500, Joel Fernandes wrote:
> On Fri, Feb 28, 2020 at 02:54:51PM +0530, madhuparnabhowmik10@gmail.com wrote:
> > From: Madhuparna Bhowmik <madhuparnabhowmik10@gmail.com>
> > 
> > This patch default enables CONFIG_PROVE_RCU_LIST option with
> > CONFIG_PROVE_RCU for RCU list lockdep debugging.
> > 
> > With this change, RCU list lockdep debugging will be default
> > enabled in CONFIG_PROVE_RCU=y kernels.
> > 
> > Most of the RCU users (in core kernel/, drivers/, and net/
> > subsystem) have already been modified to include lockdep
> > expressions hence RCU list debugging can be enabled by
> > default.
> > 
> > However, there are still chances of enountering
> > false-positive lockdep splats because not everything is converted,
> > in case RCU list primitives are used in non-RCU read-side critical
> > section but under the protection of a lock. It would be okay to
> > have a few false-positives, as long as bugs are identified, since this
> > patch only affects debugging kernels.
> > 
> > Co-developed-by: Amol Grover <frextrite@gmail.com>
> > Signed-off-by: Amol Grover <frextrite@gmail.com>
> > Signed-off-by: Madhuparna Bhowmik <madhuparnabhowmik10@gmail.com>
> 
> Acked-by: Joel Fernandes (Google) <joel@joelfernandes.org>

Queued, thank you both!

							Thanx, Paul

> thanks,
> 
>  - Joel
> 
> > ---
> >  kernel/rcu/Kconfig.debug | 11 +++--------
> >  1 file changed, 3 insertions(+), 8 deletions(-)
> > 
> > diff --git a/kernel/rcu/Kconfig.debug b/kernel/rcu/Kconfig.debug
> > index 4aa02eee8f6c..ec4bb6c09048 100644
> > --- a/kernel/rcu/Kconfig.debug
> > +++ b/kernel/rcu/Kconfig.debug
> > @@ -9,15 +9,10 @@ config PROVE_RCU
> >  	def_bool PROVE_LOCKING
> >  
> >  config PROVE_RCU_LIST
> > -	bool "RCU list lockdep debugging"
> > -	depends on PROVE_RCU && RCU_EXPERT
> > -	default n
> > +	def_bool PROVE_RCU
> >  	help
> > -	  Enable RCU lockdep checking for list usages. By default it is
> > -	  turned off since there are several list RCU users that still
> > -	  need to be converted to pass a lockdep expression. To prevent
> > -	  false-positive splats, we keep it default disabled but once all
> > -	  users are converted, we can remove this config option.
> > +	  Enable RCU lockdep checking for list usages. It is default
> > +	  enabled with CONFIG_PROVE_RCU.
> >  
> >  config TORTURE_TEST
> >  	tristate
> > -- 
> > 2.17.1
> >
Marek Szyprowski March 5, 2020, 10:50 a.m. UTC | #3
Dear All,

On 28.02.2020 10:24, madhuparnabhowmik10@gmail.com wrote:
> From: Madhuparna Bhowmik <madhuparnabhowmik10@gmail.com>
>
> This patch default enables CONFIG_PROVE_RCU_LIST option with
> CONFIG_PROVE_RCU for RCU list lockdep debugging.
>
> With this change, RCU list lockdep debugging will be default
> enabled in CONFIG_PROVE_RCU=y kernels.
>
> Most of the RCU users (in core kernel/, drivers/, and net/
> subsystem) have already been modified to include lockdep
> expressions hence RCU list debugging can be enabled by
> default.
>
> However, there are still chances of enountering
> false-positive lockdep splats because not everything is converted,
> in case RCU list primitives are used in non-RCU read-side critical
> section but under the protection of a lock. It would be okay to
> have a few false-positives, as long as bugs are identified, since this
> patch only affects debugging kernels.
>
> Co-developed-by: Amol Grover <frextrite@gmail.com>
> Signed-off-by: Amol Grover <frextrite@gmail.com>
> Signed-off-by: Madhuparna Bhowmik <madhuparnabhowmik10@gmail.com>

This patch landed in today's linux-next (20200305) as commit 
c9af03c14bfdfd21515e556c3a90ffe2aadc964d. It causes the following kernel 
warning during system suspend/resume cycle on all ARM 32bit Samsung 
Exynos-based boards (kernel compiled from exynos_defconfig):

# rtcwake -s 10 -m mem
rtcwake: wakeup from "mem" using /dev/rtc0 at Sat Jan  1 00:01:13 2000
PM: suspend entry (deep)
Filesystems sync: 0.008 seconds
Freezing user space processes ... (elapsed 0.003 seconds) done.
OOM killer disabled.
Freezing remaining freezable tasks ... (elapsed 0.013 seconds) done.
printk: Suspending console(s) (use no_console_suspend to debug)

=============================
WARNING: suspicious RCU usage
5.6.0-rc1-00177-gc9af03c14bfd #7728 Not tainted
-----------------------------
drivers/base/power/main.c:326 RCU-list traversed in non-reader section!!

other info that might help us debug this:


rcu_scheduler_active = 2, debug_locks = 1
5 locks held by rtcwake/1452:
  #0: edba7270 (sb_writers#7){.+.+}, at: vfs_write+0x16c/0x180
  #1: ece71f44 (&of->mutex){+.+.}, at: kernfs_fop_write+0xd0/0x228
  #2: eda72008 (kn->count#98){.+.+}, at: kernfs_fop_write+0xd8/0x228
  #3: c121223c (system_transition_mutex){+.+.}, at: pm_suspend+0xc0/0x480
  #4: c1285d98 (device_links_srcu){....}, at: 
device_links_read_lock+0x0/0x50

stack backtrace:
CPU: 7 PID: 1452 Comm: rtcwake Not tainted 5.6.0-rc1-00177-gc9af03c14bfd 
#7728
Hardware name: Samsung Exynos (Flattened Device Tree)
[<c0112710>] (unwind_backtrace) from [<c010e1f4>] (show_stack+0x10/0x14)
[<c010e1f4>] (show_stack) from [<c0b5c50c>] (dump_stack+0xb4/0xe0)
[<c0b5c50c>] (dump_stack) from [<c061ea2c>] 
(dpm_wait_for_subordinate+0xf4/0xfc)
[<c061ea2c>] (dpm_wait_for_subordinate) from [<c061f578>] 
(__device_suspend+0x20/0x838)
[<c061f578>] (__device_suspend) from [<c0622e1c>] (dpm_suspend+0x188/0x57c)
[<c0622e1c>] (dpm_suspend) from [<c0623bfc>] (dpm_suspend_start+0x98/0xa0)
[<c0623bfc>] (dpm_suspend_start) from [<c0197e20>] 
(suspend_devices_and_enter+0xec/0xc74)
[<c0197e20>] (suspend_devices_and_enter) from [<c0198da0>] 
(pm_suspend+0x3f8/0x480)
[<c0198da0>] (pm_suspend) from [<c019696c>] (state_store+0x6c/0xc8)
[<c019696c>] (state_store) from [<c0356c78>] (kernfs_fop_write+0x10c/0x228)
[<c0356c78>] (kernfs_fop_write) from [<c02b52c8>] (__vfs_write+0x30/0x1d0)
[<c02b52c8>] (__vfs_write) from [<c02b8264>] (vfs_write+0xa4/0x180)
[<c02b8264>] (vfs_write) from [<c02b84c0>] (ksys_write+0x60/0xd8)
[<c02b84c0>] (ksys_write) from [<c0101000>] (ret_fast_syscall+0x0/0x28)
Exception stack(0xeac89fa8 to 0xeac89ff0)
9fa0:                   00000004 0002b440 00000004 0002b440 00000004 
00000000
9fc0: 00000004 0002b440 000291b0 00000004 0002b440 00000004 be980bfc 
00028160
9fe0: 0000006c be980ac8 b6eae000 b6f0b634

=============================
WARNING: suspicious RCU usage
5.6.0-rc1-00177-gc9af03c14bfd #7728 Not tainted
-----------------------------
drivers/base/power/main.c:1698 RCU-list traversed in non-reader section!!

other info that might help us debug this:


rcu_scheduler_active = 2, debug_locks = 1
6 locks held by rtcwake/1452:
  #0: edba7270 (sb_writers#7){.+.+}, at: vfs_write+0x16c/0x180
  #1: ece71f44 (&of->mutex){+.+.}, at: kernfs_fop_write+0xd0/0x228
  #2: eda72008 (kn->count#98){.+.+}, at: kernfs_fop_write+0xd8/0x228
  #3: c121223c (system_transition_mutex){+.+.}, at: pm_suspend+0xc0/0x480
  #4: ebecd984 (&dev->mutex){....}, at: __device_suspend+0xf8/0x838
  #5: c1285d98 (device_links_srcu){....}, at: 
device_links_read_lock+0x0/0x50

stack backtrace:
CPU: 7 PID: 1452 Comm: rtcwake Not tainted 5.6.0-rc1-00177-gc9af03c14bfd 
#7728
Hardware name: Samsung Exynos (Flattened Device Tree)
[<c0112710>] (unwind_backtrace) from [<c010e1f4>] (show_stack+0x10/0x14)
[<c010e1f4>] (show_stack) from [<c0b5c50c>] (dump_stack+0xb4/0xe0)
[<c0b5c50c>] (dump_stack) from [<c061f8bc>] (__device_suspend+0x364/0x838)
[<c061f8bc>] (__device_suspend) from [<c0622e1c>] (dpm_suspend+0x188/0x57c)
[<c0622e1c>] (dpm_suspend) from [<c0623bfc>] (dpm_suspend_start+0x98/0xa0)
[<c0623bfc>] (dpm_suspend_start) from [<c0197e20>] 
(suspend_devices_and_enter+0xec/0xc74)
[<c0197e20>] (suspend_devices_and_enter) from [<c0198da0>] 
(pm_suspend+0x3f8/0x480)
[<c0198da0>] (pm_suspend) from [<c019696c>] (state_store+0x6c/0xc8)
[<c019696c>] (state_store) from [<c0356c78>] (kernfs_fop_write+0x10c/0x228)
[<c0356c78>] (kernfs_fop_write) from [<c02b52c8>] (__vfs_write+0x30/0x1d0)
[<c02b52c8>] (__vfs_write) from [<c02b8264>] (vfs_write+0xa4/0x180)
[<c02b8264>] (vfs_write) from [<c02b84c0>] (ksys_write+0x60/0xd8)
[<c02b84c0>] (ksys_write) from [<c0101000>] (ret_fast_syscall+0x0/0x28)
Exception stack(0xeac89fa8 to 0xeac89ff0)
9fa0:                   00000004 0002b440 00000004 0002b440 00000004 
00000000
9fc0: 00000004 0002b440 000291b0 00000004 0002b440 00000004 be980bfc 
00028160
9fe0: 0000006c be980ac8 b6eae000 b6f0b634
wake enabled for irq 160
wake enabled for irq 164
samsung-pinctrl 13400000.pinctrl: Setting external wakeup interrupt 
mask: 0xffffffe7

=============================
WARNING: suspicious RCU usage
5.6.0-rc1-00177-gc9af03c14bfd #7728 Not tainted
-----------------------------
drivers/base/power/wakeup.c:408 RCU-list traversed in non-reader section!!

other info that might help us debug this:


rcu_scheduler_active = 2, debug_locks = 1
5 locks held by rtcwake/1452:
  #0: edba7270 (sb_writers#7){.+.+}, at: vfs_write+0x16c/0x180
  #1: ece71f44 (&of->mutex){+.+.}, at: kernfs_fop_write+0xd0/0x228
  #2: eda72008 (kn->count#98){.+.+}, at: kernfs_fop_write+0xd8/0x228
  #3: c121223c (system_transition_mutex){+.+.}, at: pm_suspend+0xc0/0x480
  #4: c128722c (wakeup_srcu){....}, at: 
device_wakeup_arm_wake_irqs+0x0/0x124

stack backtrace:
CPU: 5 PID: 1452 Comm: rtcwake Not tainted 5.6.0-rc1-00177-gc9af03c14bfd 
#7728
Hardware name: Samsung Exynos (Flattened Device Tree)
[<c0112710>] (unwind_backtrace) from [<c010e1f4>] (show_stack+0x10/0x14)
[<c010e1f4>] (show_stack) from [<c0b5c50c>] (dump_stack+0xb4/0xe0)
[<c0b5c50c>] (dump_stack) from [<c0625740>] 
(device_wakeup_arm_wake_irqs+0xdc/0x124)
[<c0625740>] (device_wakeup_arm_wake_irqs) from [<c0622120>] 
(dpm_suspend_noirq+0x1c/0x5a0)
[<c0622120>] (dpm_suspend_noirq) from [<c019805c>] 
(suspend_devices_and_enter+0x328/0xc74)
[<c019805c>] (suspend_devices_and_enter) from [<c0198da0>] 
(pm_suspend+0x3f8/0x480)
[<c0198da0>] (pm_suspend) from [<c019696c>] (state_store+0x6c/0xc8)
[<c019696c>] (state_store) from [<c0356c78>] (kernfs_fop_write+0x10c/0x228)
[<c0356c78>] (kernfs_fop_write) from [<c02b52c8>] (__vfs_write+0x30/0x1d0)
[<c02b52c8>] (__vfs_write) from [<c02b8264>] (vfs_write+0xa4/0x180)
[<c02b8264>] (vfs_write) from [<c02b84c0>] (ksys_write+0x60/0xd8)
[<c02b84c0>] (ksys_write) from [<c0101000>] (ret_fast_syscall+0x0/0x28)
Exception stack(0xeac89fa8 to 0xeac89ff0)
9fa0:                   00000004 0002b440 00000004 0002b440 00000004 
00000000
9fc0: 00000004 0002b440 000291b0 00000004 0002b440 00000004 be980bfc 
00028160
9fe0: 0000006c be980ac8 b6eae000 b6f0b634

=============================
WARNING: suspicious RCU usage
5.6.0-rc1-00177-gc9af03c14bfd #7728 Not tainted
-----------------------------
drivers/base/power/main.c:1238 RCU-list traversed in non-reader section!!

other info that might help us debug this:


rcu_scheduler_active = 2, debug_locks = 1
5 locks held by rtcwake/1452:
  #0: edba7270 (sb_writers#7){.+.+}, at: vfs_write+0x16c/0x180
  #1: ece71f44 (&of->mutex){+.+.}, at: kernfs_fop_write+0xd0/0x228
  #2: eda72008 (kn->count#98){.+.+}, at: kernfs_fop_write+0xd8/0x228
  #3: c121223c (system_transition_mutex){+.+.}, at: pm_suspend+0xc0/0x480
  #4: c1285d98 (device_links_srcu){....}, at: 
device_links_read_lock+0x0/0x50

stack backtrace:
CPU: 5 PID: 1452 Comm: rtcwake Not tainted 5.6.0-rc1-00177-gc9af03c14bfd 
#7728
Hardware name: Samsung Exynos (Flattened Device Tree)
[<c0112710>] (unwind_backtrace) from [<c010e1f4>] (show_stack+0x10/0x14)
[<c010e1f4>] (show_stack) from [<c0b5c50c>] (dump_stack+0xb4/0xe0)
[<c0b5c50c>] (dump_stack) from [<c06202d4>] 
(__device_suspend_noirq+0x234/0x304)
[<c06202d4>] (__device_suspend_noirq) from [<c0622284>] 
(dpm_suspend_noirq+0x180/0x5a0)
[<c0622284>] (dpm_suspend_noirq) from [<c019805c>] 
(suspend_devices_and_enter+0x328/0xc74)
[<c019805c>] (suspend_devices_and_enter) from [<c0198da0>] 
(pm_suspend+0x3f8/0x480)
[<c0198da0>] (pm_suspend) from [<c019696c>] (state_store+0x6c/0xc8)
[<c019696c>] (state_store) from [<c0356c78>] (kernfs_fop_write+0x10c/0x228)
[<c0356c78>] (kernfs_fop_write) from [<c02b52c8>] (__vfs_write+0x30/0x1d0)
[<c02b52c8>] (__vfs_write) from [<c02b8264>] (vfs_write+0xa4/0x180)
[<c02b8264>] (vfs_write) from [<c02b84c0>] (ksys_write+0x60/0xd8)
[<c02b84c0>] (ksys_write) from [<c0101000>] (ret_fast_syscall+0x0/0x28)
Exception stack(0xeac89fa8 to 0xeac89ff0)
9fa0:                   00000004 0002b440 00000004 0002b440 00000004 
00000000
9fc0: 00000004 0002b440 000291b0 00000004 0002b440 00000004 be980bfc 
00028160
9fe0: 0000006c be980ac8 b6eae000 b6f0b634
Disabling non-boot CPUs ...
IRQ 51: no longer affine to CPU1
IRQ 52: no longer affine to CPU2
IRQ 53: no longer affine to CPU3
IRQ 54: no longer affine to CPU4
IRQ 55: no longer affine to CPU5
IRQ 56: no longer affine to CPU6
IRQ 57: no longer affine to CPU7
Enabling non-boot CPUs ...
CPU1 is up
CPU2 is up
CPU3 is up
CPU4: detected I-Cache line size mismatch, workaround enabled
CPU4 is up
CPU5: detected I-Cache line size mismatch, workaround enabled
CPU5 is up
CPU6: detected I-Cache line size mismatch, workaround enabled
CPU6 is up
CPU7: detected I-Cache line size mismatch, workaround enabled
CPU7 is up

=============================
WARNING: suspicious RCU usage
5.6.0-rc1-00177-gc9af03c14bfd #7728 Not tainted
-----------------------------
drivers/base/power/main.c:269 RCU-list traversed in non-reader section!!

other info that might help us debug this:


rcu_scheduler_active = 2, debug_locks = 1
5 locks held by rtcwake/1452:
  #0: edba7270 (sb_writers#7){.+.+}, at: vfs_write+0x16c/0x180
  #1: ece71f44 (&of->mutex){+.+.}, at: kernfs_fop_write+0xd0/0x228
  #2: eda72008 (kn->count#98){.+.+}, at: kernfs_fop_write+0xd8/0x228
  #3: c121223c (system_transition_mutex){+.+.}, at: pm_suspend+0xc0/0x480
  #4: c1285d98 (device_links_srcu){....}, at: 
device_links_read_lock+0x0/0x50

stack backtrace:
CPU: 0 PID: 1452 Comm: rtcwake Not tainted 5.6.0-rc1-00177-gc9af03c14bfd 
#7728
Hardware name: Samsung Exynos (Flattened Device Tree)
[<c0112710>] (unwind_backtrace) from [<c010e1f4>] (show_stack+0x10/0x14)
[<c010e1f4>] (show_stack) from [<c0b5c50c>] (dump_stack+0xb4/0xe0)
[<c0b5c50c>] (dump_stack) from [<c061ebf0>] 
(dpm_wait_for_superior+0x114/0x12c)
[<c061ebf0>] (dpm_wait_for_superior) from [<c061fe98>] 
(device_resume_noirq+0x74/0x238)
[<c061fe98>] (device_resume_noirq) from [<c0620a34>] 
(dpm_resume_noirq+0x160/0x53c)
[<c0620a34>] (dpm_resume_noirq) from [<c01983c8>] 
(suspend_devices_and_enter+0x694/0xc74)
[<c01983c8>] (suspend_devices_and_enter) from [<c0198da0>] 
(pm_suspend+0x3f8/0x480)
[<c0198da0>] (pm_suspend) from [<c019696c>] (state_store+0x6c/0xc8)
[<c019696c>] (state_store) from [<c0356c78>] (kernfs_fop_write+0x10c/0x228)
[<c0356c78>] (kernfs_fop_write) from [<c02b52c8>] (__vfs_write+0x30/0x1d0)
[<c02b52c8>] (__vfs_write) from [<c02b8264>] (vfs_write+0xa4/0x180)
[<c02b8264>] (vfs_write) from [<c02b84c0>] (ksys_write+0x60/0xd8)
[<c02b84c0>] (ksys_write) from [<c0101000>] (ret_fast_syscall+0x0/0x28)
Exception stack(0xeac89fa8 to 0xeac89ff0)
9fa0:                   00000004 0002b440 00000004 0002b440 00000004 
00000000
9fc0: 00000004 0002b440 000291b0 00000004 0002b440 00000004 be980bfc 
00028160
9fe0: 0000006c be980ac8 b6eae000 b6f0b634
s3c-i2c 12c80000.i2c: slave address 0x00
s3c-i2c 12c80000.i2c: bus frequency set to 65 KHz

=============================
WARNING: suspicious RCU usage
5.6.0-rc1-00177-gc9af03c14bfd #7728 Not tainted
-----------------------------
drivers/base/power/wakeup.c:424 RCU-list traversed in non-reader section!!

other info that might help us debug this:


rcu_scheduler_active = 2, debug_locks = 1
5 locks held by rtcwake/1452:
  #0: edba7270 (sb_writers#7){.+.+}, at: vfs_write+0x16c/0x180
  #1: ece71f44 (&of->mutex){+.+.}, at: kernfs_fop_write+0xd0/0x228
  #2: eda72008 (kn->count#98){.+.+}, at: kernfs_fop_write+0xd8/0x228
  #3: c121223c (system_transition_mutex){+.+.}, at: pm_suspend+0xc0/0x480
  #4: c128722c (wakeup_srcu){....}, at: 
device_wakeup_disarm_wake_irqs+0x0/0x124

stack backtrace:
CPU: 0 PID: 1452 Comm: rtcwake Not tainted 5.6.0-rc1-00177-gc9af03c14bfd 
#7728
Hardware name: Samsung Exynos (Flattened Device Tree)
[<c0112710>] (unwind_backtrace) from [<c010e1f4>] (show_stack+0x10/0x14)
[<c010e1f4>] (show_stack) from [<c0b5c50c>] (dump_stack+0xb4/0xe0)
[<c0b5c50c>] (dump_stack) from [<c0625864>] 
(device_wakeup_disarm_wake_irqs+0xdc/0x124)
[<c0625864>] (device_wakeup_disarm_wake_irqs) from [<c0620b84>] 
(dpm_resume_noirq+0x2b0/0x53c)
[<c0620b84>] (dpm_resume_noirq) from [<c01983c8>] 
(suspend_devices_and_enter+0x694/0xc74)
[<c01983c8>] (suspend_devices_and_enter) from [<c0198da0>] 
(pm_suspend+0x3f8/0x480)
[<c0198da0>] (pm_suspend) from [<c019696c>] (state_store+0x6c/0xc8)
[<c019696c>] (state_store) from [<c0356c78>] (kernfs_fop_write+0x10c/0x228)
[<c0356c78>] (kernfs_fop_write) from [<c02b52c8>] (__vfs_write+0x30/0x1d0)
[<c02b52c8>] (__vfs_write) from [<c02b8264>] (vfs_write+0xa4/0x180)
[<c02b8264>] (vfs_write) from [<c02b84c0>] (ksys_write+0x60/0xd8)
[<c02b84c0>] (ksys_write) from [<c0101000>] (ret_fast_syscall+0x0/0x28)
Exception stack(0xeac89fa8 to 0xeac89ff0)
9fa0:                   00000004 0002b440 00000004 0002b440 00000004 
00000000
9fc0: 00000004 0002b440 000291b0 00000004 0002b440 00000004 be980bfc 
00028160
9fe0: 0000006c be980ac8 b6eae000 b6f0b634

I can help debugging this issue.

> ---
>   kernel/rcu/Kconfig.debug | 11 +++--------
>   1 file changed, 3 insertions(+), 8 deletions(-)
>
> diff --git a/kernel/rcu/Kconfig.debug b/kernel/rcu/Kconfig.debug
> index 4aa02eee8f6c..ec4bb6c09048 100644
> --- a/kernel/rcu/Kconfig.debug
> +++ b/kernel/rcu/Kconfig.debug
> @@ -9,15 +9,10 @@ config PROVE_RCU
>   	def_bool PROVE_LOCKING
>   
>   config PROVE_RCU_LIST
> -	bool "RCU list lockdep debugging"
> -	depends on PROVE_RCU && RCU_EXPERT
> -	default n
> +	def_bool PROVE_RCU
>   	help
> -	  Enable RCU lockdep checking for list usages. By default it is
> -	  turned off since there are several list RCU users that still
> -	  need to be converted to pass a lockdep expression. To prevent
> -	  false-positive splats, we keep it default disabled but once all
> -	  users are converted, we can remove this config option.
> +	  Enable RCU lockdep checking for list usages. It is default
> +	  enabled with CONFIG_PROVE_RCU.
>   
>   config TORTURE_TEST
>   	tristate

Best regards
Guenter Roeck March 5, 2020, 3:52 p.m. UTC | #4
On Fri, Feb 28, 2020 at 02:54:51PM +0530, madhuparnabhowmik10@gmail.com wrote:
> From: Madhuparna Bhowmik <madhuparnabhowmik10@gmail.com>
> 
> This patch default enables CONFIG_PROVE_RCU_LIST option with
> CONFIG_PROVE_RCU for RCU list lockdep debugging.
> 
> With this change, RCU list lockdep debugging will be default
> enabled in CONFIG_PROVE_RCU=y kernels.
> 
> Most of the RCU users (in core kernel/, drivers/, and net/
> subsystem) have already been modified to include lockdep
> expressions hence RCU list debugging can be enabled by
> default.
> 
> However, there are still chances of enountering
> false-positive lockdep splats because not everything is converted,
> in case RCU list primitives are used in non-RCU read-side critical
> section but under the protection of a lock. It would be okay to
> have a few false-positives, as long as bugs are identified, since this
> patch only affects debugging kernels.
> 
> Co-developed-by: Amol Grover <frextrite@gmail.com>
> Signed-off-by: Amol Grover <frextrite@gmail.com>
> Signed-off-by: Madhuparna Bhowmik <madhuparnabhowmik10@gmail.com>

Who is going to fix the fallout ?

fs/btrfs/block-group.c:2011 RCU-list traversed in non-reader section!!
kernel/kprobes.c:329 RCU-list traversed in non-reader section!!
net/ipv4/ipmr.c:136 RCU-list traversed in non-reader section!!

This is just from my boot tests. I'll keep PROVE_RCU enabled for the
time being, but unless the noise is addressed I'll have to disable it
because otherwise the real problems disappear in the noise.

Guenter
Madhuparna Bhowmik March 5, 2020, 5:23 p.m. UTC | #5
On Thu, Mar 05, 2020 at 11:50:37AM +0100, Marek Szyprowski wrote:
> Dear All,
> 
> On 28.02.2020 10:24, madhuparnabhowmik10@gmail.com wrote:
> > From: Madhuparna Bhowmik <madhuparnabhowmik10@gmail.com>
> >
> > This patch default enables CONFIG_PROVE_RCU_LIST option with
> > CONFIG_PROVE_RCU for RCU list lockdep debugging.
> >
> > With this change, RCU list lockdep debugging will be default
> > enabled in CONFIG_PROVE_RCU=y kernels.
> >
> > Most of the RCU users (in core kernel/, drivers/, and net/
> > subsystem) have already been modified to include lockdep
> > expressions hence RCU list debugging can be enabled by
> > default.
> >
> > However, there are still chances of enountering
> > false-positive lockdep splats because not everything is converted,
> > in case RCU list primitives are used in non-RCU read-side critical
> > section but under the protection of a lock. It would be okay to
> > have a few false-positives, as long as bugs are identified, since this
> > patch only affects debugging kernels.
> >
> > Co-developed-by: Amol Grover <frextrite@gmail.com>
> > Signed-off-by: Amol Grover <frextrite@gmail.com>
> > Signed-off-by: Madhuparna Bhowmik <madhuparnabhowmik10@gmail.com>
> 
> This patch landed in today's linux-next (20200305) as commit 
> c9af03c14bfdfd21515e556c3a90ffe2aadc964d. It causes the following kernel 
> warning during system suspend/resume cycle on all ARM 32bit Samsung 
> Exynos-based boards (kernel compiled from exynos_defconfig):
> 
> # rtcwake -s 10 -m mem
> rtcwake: wakeup from "mem" using /dev/rtc0 at Sat Jan  1 00:01:13 2000
> PM: suspend entry (deep)
> Filesystems sync: 0.008 seconds
> Freezing user space processes ... (elapsed 0.003 seconds) done.
> OOM killer disabled.
> Freezing remaining freezable tasks ... (elapsed 0.013 seconds) done.
> printk: Suspending console(s) (use no_console_suspend to debug)
>
Hi,

These warnings in power/main.c and power/wakeup.c  are already
addressed.
Check this https://lore.kernel.org/patchwork/patch/1204515/
Thank you,
Madhuparna
> =============================
> WARNING: suspicious RCU usage
> 5.6.0-rc1-00177-gc9af03c14bfd #7728 Not tainted
> -----------------------------
> drivers/base/power/main.c:326 RCU-list traversed in non-reader section!!
> 
> other info that might help us debug this:
> 
> 
> rcu_scheduler_active = 2, debug_locks = 1
> 5 locks held by rtcwake/1452:
>   #0: edba7270 (sb_writers#7){.+.+}, at: vfs_write+0x16c/0x180
>   #1: ece71f44 (&of->mutex){+.+.}, at: kernfs_fop_write+0xd0/0x228
>   #2: eda72008 (kn->count#98){.+.+}, at: kernfs_fop_write+0xd8/0x228
>   #3: c121223c (system_transition_mutex){+.+.}, at: pm_suspend+0xc0/0x480
>   #4: c1285d98 (device_links_srcu){....}, at: 
> device_links_read_lock+0x0/0x50
> 
> stack backtrace:
> CPU: 7 PID: 1452 Comm: rtcwake Not tainted 5.6.0-rc1-00177-gc9af03c14bfd 
> #7728
> Hardware name: Samsung Exynos (Flattened Device Tree)
> [<c0112710>] (unwind_backtrace) from [<c010e1f4>] (show_stack+0x10/0x14)
> [<c010e1f4>] (show_stack) from [<c0b5c50c>] (dump_stack+0xb4/0xe0)
> [<c0b5c50c>] (dump_stack) from [<c061ea2c>] 
> (dpm_wait_for_subordinate+0xf4/0xfc)
> [<c061ea2c>] (dpm_wait_for_subordinate) from [<c061f578>] 
> (__device_suspend+0x20/0x838)
> [<c061f578>] (__device_suspend) from [<c0622e1c>] (dpm_suspend+0x188/0x57c)
> [<c0622e1c>] (dpm_suspend) from [<c0623bfc>] (dpm_suspend_start+0x98/0xa0)
> [<c0623bfc>] (dpm_suspend_start) from [<c0197e20>] 
> (suspend_devices_and_enter+0xec/0xc74)
> [<c0197e20>] (suspend_devices_and_enter) from [<c0198da0>] 
> (pm_suspend+0x3f8/0x480)
> [<c0198da0>] (pm_suspend) from [<c019696c>] (state_store+0x6c/0xc8)
> [<c019696c>] (state_store) from [<c0356c78>] (kernfs_fop_write+0x10c/0x228)
> [<c0356c78>] (kernfs_fop_write) from [<c02b52c8>] (__vfs_write+0x30/0x1d0)
> [<c02b52c8>] (__vfs_write) from [<c02b8264>] (vfs_write+0xa4/0x180)
> [<c02b8264>] (vfs_write) from [<c02b84c0>] (ksys_write+0x60/0xd8)
> [<c02b84c0>] (ksys_write) from [<c0101000>] (ret_fast_syscall+0x0/0x28)
> Exception stack(0xeac89fa8 to 0xeac89ff0)
> 9fa0:                   00000004 0002b440 00000004 0002b440 00000004 
> 00000000
> 9fc0: 00000004 0002b440 000291b0 00000004 0002b440 00000004 be980bfc 
> 00028160
> 9fe0: 0000006c be980ac8 b6eae000 b6f0b634
> 
> =============================
> WARNING: suspicious RCU usage
> 5.6.0-rc1-00177-gc9af03c14bfd #7728 Not tainted
> -----------------------------
> drivers/base/power/main.c:1698 RCU-list traversed in non-reader section!!
> 
> other info that might help us debug this:
> 
> 
> rcu_scheduler_active = 2, debug_locks = 1
> 6 locks held by rtcwake/1452:
>   #0: edba7270 (sb_writers#7){.+.+}, at: vfs_write+0x16c/0x180
>   #1: ece71f44 (&of->mutex){+.+.}, at: kernfs_fop_write+0xd0/0x228
>   #2: eda72008 (kn->count#98){.+.+}, at: kernfs_fop_write+0xd8/0x228
>   #3: c121223c (system_transition_mutex){+.+.}, at: pm_suspend+0xc0/0x480
>   #4: ebecd984 (&dev->mutex){....}, at: __device_suspend+0xf8/0x838
>   #5: c1285d98 (device_links_srcu){....}, at: 
> device_links_read_lock+0x0/0x50
> 
> stack backtrace:
> CPU: 7 PID: 1452 Comm: rtcwake Not tainted 5.6.0-rc1-00177-gc9af03c14bfd 
> #7728
> Hardware name: Samsung Exynos (Flattened Device Tree)
> [<c0112710>] (unwind_backtrace) from [<c010e1f4>] (show_stack+0x10/0x14)
> [<c010e1f4>] (show_stack) from [<c0b5c50c>] (dump_stack+0xb4/0xe0)
> [<c0b5c50c>] (dump_stack) from [<c061f8bc>] (__device_suspend+0x364/0x838)
> [<c061f8bc>] (__device_suspend) from [<c0622e1c>] (dpm_suspend+0x188/0x57c)
> [<c0622e1c>] (dpm_suspend) from [<c0623bfc>] (dpm_suspend_start+0x98/0xa0)
> [<c0623bfc>] (dpm_suspend_start) from [<c0197e20>] 
> (suspend_devices_and_enter+0xec/0xc74)
> [<c0197e20>] (suspend_devices_and_enter) from [<c0198da0>] 
> (pm_suspend+0x3f8/0x480)
> [<c0198da0>] (pm_suspend) from [<c019696c>] (state_store+0x6c/0xc8)
> [<c019696c>] (state_store) from [<c0356c78>] (kernfs_fop_write+0x10c/0x228)
> [<c0356c78>] (kernfs_fop_write) from [<c02b52c8>] (__vfs_write+0x30/0x1d0)
> [<c02b52c8>] (__vfs_write) from [<c02b8264>] (vfs_write+0xa4/0x180)
> [<c02b8264>] (vfs_write) from [<c02b84c0>] (ksys_write+0x60/0xd8)
> [<c02b84c0>] (ksys_write) from [<c0101000>] (ret_fast_syscall+0x0/0x28)
> Exception stack(0xeac89fa8 to 0xeac89ff0)
> 9fa0:                   00000004 0002b440 00000004 0002b440 00000004 
> 00000000
> 9fc0: 00000004 0002b440 000291b0 00000004 0002b440 00000004 be980bfc 
> 00028160
> 9fe0: 0000006c be980ac8 b6eae000 b6f0b634
> wake enabled for irq 160
> wake enabled for irq 164
> samsung-pinctrl 13400000.pinctrl: Setting external wakeup interrupt 
> mask: 0xffffffe7
> 
> =============================
> WARNING: suspicious RCU usage
> 5.6.0-rc1-00177-gc9af03c14bfd #7728 Not tainted
> -----------------------------
> drivers/base/power/wakeup.c:408 RCU-list traversed in non-reader section!!
> 
> other info that might help us debug this:
> 
> 
> rcu_scheduler_active = 2, debug_locks = 1
> 5 locks held by rtcwake/1452:
>   #0: edba7270 (sb_writers#7){.+.+}, at: vfs_write+0x16c/0x180
>   #1: ece71f44 (&of->mutex){+.+.}, at: kernfs_fop_write+0xd0/0x228
>   #2: eda72008 (kn->count#98){.+.+}, at: kernfs_fop_write+0xd8/0x228
>   #3: c121223c (system_transition_mutex){+.+.}, at: pm_suspend+0xc0/0x480
>   #4: c128722c (wakeup_srcu){....}, at: 
> device_wakeup_arm_wake_irqs+0x0/0x124
> 
> stack backtrace:
> CPU: 5 PID: 1452 Comm: rtcwake Not tainted 5.6.0-rc1-00177-gc9af03c14bfd 
> #7728
> Hardware name: Samsung Exynos (Flattened Device Tree)
> [<c0112710>] (unwind_backtrace) from [<c010e1f4>] (show_stack+0x10/0x14)
> [<c010e1f4>] (show_stack) from [<c0b5c50c>] (dump_stack+0xb4/0xe0)
> [<c0b5c50c>] (dump_stack) from [<c0625740>] 
> (device_wakeup_arm_wake_irqs+0xdc/0x124)
> [<c0625740>] (device_wakeup_arm_wake_irqs) from [<c0622120>] 
> (dpm_suspend_noirq+0x1c/0x5a0)
> [<c0622120>] (dpm_suspend_noirq) from [<c019805c>] 
> (suspend_devices_and_enter+0x328/0xc74)
> [<c019805c>] (suspend_devices_and_enter) from [<c0198da0>] 
> (pm_suspend+0x3f8/0x480)
> [<c0198da0>] (pm_suspend) from [<c019696c>] (state_store+0x6c/0xc8)
> [<c019696c>] (state_store) from [<c0356c78>] (kernfs_fop_write+0x10c/0x228)
> [<c0356c78>] (kernfs_fop_write) from [<c02b52c8>] (__vfs_write+0x30/0x1d0)
> [<c02b52c8>] (__vfs_write) from [<c02b8264>] (vfs_write+0xa4/0x180)
> [<c02b8264>] (vfs_write) from [<c02b84c0>] (ksys_write+0x60/0xd8)
> [<c02b84c0>] (ksys_write) from [<c0101000>] (ret_fast_syscall+0x0/0x28)
> Exception stack(0xeac89fa8 to 0xeac89ff0)
> 9fa0:                   00000004 0002b440 00000004 0002b440 00000004 
> 00000000
> 9fc0: 00000004 0002b440 000291b0 00000004 0002b440 00000004 be980bfc 
> 00028160
> 9fe0: 0000006c be980ac8 b6eae000 b6f0b634
> 
> =============================
> WARNING: suspicious RCU usage
> 5.6.0-rc1-00177-gc9af03c14bfd #7728 Not tainted
> -----------------------------
> drivers/base/power/main.c:1238 RCU-list traversed in non-reader section!!
> 
> other info that might help us debug this:
> 
> 
> rcu_scheduler_active = 2, debug_locks = 1
> 5 locks held by rtcwake/1452:
>   #0: edba7270 (sb_writers#7){.+.+}, at: vfs_write+0x16c/0x180
>   #1: ece71f44 (&of->mutex){+.+.}, at: kernfs_fop_write+0xd0/0x228
>   #2: eda72008 (kn->count#98){.+.+}, at: kernfs_fop_write+0xd8/0x228
>   #3: c121223c (system_transition_mutex){+.+.}, at: pm_suspend+0xc0/0x480
>   #4: c1285d98 (device_links_srcu){....}, at: 
> device_links_read_lock+0x0/0x50
> 
> stack backtrace:
> CPU: 5 PID: 1452 Comm: rtcwake Not tainted 5.6.0-rc1-00177-gc9af03c14bfd 
> #7728
> Hardware name: Samsung Exynos (Flattened Device Tree)
> [<c0112710>] (unwind_backtrace) from [<c010e1f4>] (show_stack+0x10/0x14)
> [<c010e1f4>] (show_stack) from [<c0b5c50c>] (dump_stack+0xb4/0xe0)
> [<c0b5c50c>] (dump_stack) from [<c06202d4>] 
> (__device_suspend_noirq+0x234/0x304)
> [<c06202d4>] (__device_suspend_noirq) from [<c0622284>] 
> (dpm_suspend_noirq+0x180/0x5a0)
> [<c0622284>] (dpm_suspend_noirq) from [<c019805c>] 
> (suspend_devices_and_enter+0x328/0xc74)
> [<c019805c>] (suspend_devices_and_enter) from [<c0198da0>] 
> (pm_suspend+0x3f8/0x480)
> [<c0198da0>] (pm_suspend) from [<c019696c>] (state_store+0x6c/0xc8)
> [<c019696c>] (state_store) from [<c0356c78>] (kernfs_fop_write+0x10c/0x228)
> [<c0356c78>] (kernfs_fop_write) from [<c02b52c8>] (__vfs_write+0x30/0x1d0)
> [<c02b52c8>] (__vfs_write) from [<c02b8264>] (vfs_write+0xa4/0x180)
> [<c02b8264>] (vfs_write) from [<c02b84c0>] (ksys_write+0x60/0xd8)
> [<c02b84c0>] (ksys_write) from [<c0101000>] (ret_fast_syscall+0x0/0x28)
> Exception stack(0xeac89fa8 to 0xeac89ff0)
> 9fa0:                   00000004 0002b440 00000004 0002b440 00000004 
> 00000000
> 9fc0: 00000004 0002b440 000291b0 00000004 0002b440 00000004 be980bfc 
> 00028160
> 9fe0: 0000006c be980ac8 b6eae000 b6f0b634
> Disabling non-boot CPUs ...
> IRQ 51: no longer affine to CPU1
> IRQ 52: no longer affine to CPU2
> IRQ 53: no longer affine to CPU3
> IRQ 54: no longer affine to CPU4
> IRQ 55: no longer affine to CPU5
> IRQ 56: no longer affine to CPU6
> IRQ 57: no longer affine to CPU7
> Enabling non-boot CPUs ...
> CPU1 is up
> CPU2 is up
> CPU3 is up
> CPU4: detected I-Cache line size mismatch, workaround enabled
> CPU4 is up
> CPU5: detected I-Cache line size mismatch, workaround enabled
> CPU5 is up
> CPU6: detected I-Cache line size mismatch, workaround enabled
> CPU6 is up
> CPU7: detected I-Cache line size mismatch, workaround enabled
> CPU7 is up
> 
> =============================
> WARNING: suspicious RCU usage
> 5.6.0-rc1-00177-gc9af03c14bfd #7728 Not tainted
> -----------------------------
> drivers/base/power/main.c:269 RCU-list traversed in non-reader section!!
> 
> other info that might help us debug this:
> 
> 
> rcu_scheduler_active = 2, debug_locks = 1
> 5 locks held by rtcwake/1452:
>   #0: edba7270 (sb_writers#7){.+.+}, at: vfs_write+0x16c/0x180
>   #1: ece71f44 (&of->mutex){+.+.}, at: kernfs_fop_write+0xd0/0x228
>   #2: eda72008 (kn->count#98){.+.+}, at: kernfs_fop_write+0xd8/0x228
>   #3: c121223c (system_transition_mutex){+.+.}, at: pm_suspend+0xc0/0x480
>   #4: c1285d98 (device_links_srcu){....}, at: 
> device_links_read_lock+0x0/0x50
> 
> stack backtrace:
> CPU: 0 PID: 1452 Comm: rtcwake Not tainted 5.6.0-rc1-00177-gc9af03c14bfd 
> #7728
> Hardware name: Samsung Exynos (Flattened Device Tree)
> [<c0112710>] (unwind_backtrace) from [<c010e1f4>] (show_stack+0x10/0x14)
> [<c010e1f4>] (show_stack) from [<c0b5c50c>] (dump_stack+0xb4/0xe0)
> [<c0b5c50c>] (dump_stack) from [<c061ebf0>] 
> (dpm_wait_for_superior+0x114/0x12c)
> [<c061ebf0>] (dpm_wait_for_superior) from [<c061fe98>] 
> (device_resume_noirq+0x74/0x238)
> [<c061fe98>] (device_resume_noirq) from [<c0620a34>] 
> (dpm_resume_noirq+0x160/0x53c)
> [<c0620a34>] (dpm_resume_noirq) from [<c01983c8>] 
> (suspend_devices_and_enter+0x694/0xc74)
> [<c01983c8>] (suspend_devices_and_enter) from [<c0198da0>] 
> (pm_suspend+0x3f8/0x480)
> [<c0198da0>] (pm_suspend) from [<c019696c>] (state_store+0x6c/0xc8)
> [<c019696c>] (state_store) from [<c0356c78>] (kernfs_fop_write+0x10c/0x228)
> [<c0356c78>] (kernfs_fop_write) from [<c02b52c8>] (__vfs_write+0x30/0x1d0)
> [<c02b52c8>] (__vfs_write) from [<c02b8264>] (vfs_write+0xa4/0x180)
> [<c02b8264>] (vfs_write) from [<c02b84c0>] (ksys_write+0x60/0xd8)
> [<c02b84c0>] (ksys_write) from [<c0101000>] (ret_fast_syscall+0x0/0x28)
> Exception stack(0xeac89fa8 to 0xeac89ff0)
> 9fa0:                   00000004 0002b440 00000004 0002b440 00000004 
> 00000000
> 9fc0: 00000004 0002b440 000291b0 00000004 0002b440 00000004 be980bfc 
> 00028160
> 9fe0: 0000006c be980ac8 b6eae000 b6f0b634
> s3c-i2c 12c80000.i2c: slave address 0x00
> s3c-i2c 12c80000.i2c: bus frequency set to 65 KHz
> 
> =============================
> WARNING: suspicious RCU usage
> 5.6.0-rc1-00177-gc9af03c14bfd #7728 Not tainted
> -----------------------------
> drivers/base/power/wakeup.c:424 RCU-list traversed in non-reader section!!
> 
> other info that might help us debug this:
> 
> 
> rcu_scheduler_active = 2, debug_locks = 1
> 5 locks held by rtcwake/1452:
>   #0: edba7270 (sb_writers#7){.+.+}, at: vfs_write+0x16c/0x180
>   #1: ece71f44 (&of->mutex){+.+.}, at: kernfs_fop_write+0xd0/0x228
>   #2: eda72008 (kn->count#98){.+.+}, at: kernfs_fop_write+0xd8/0x228
>   #3: c121223c (system_transition_mutex){+.+.}, at: pm_suspend+0xc0/0x480
>   #4: c128722c (wakeup_srcu){....}, at: 
> device_wakeup_disarm_wake_irqs+0x0/0x124
> 
> stack backtrace:
> CPU: 0 PID: 1452 Comm: rtcwake Not tainted 5.6.0-rc1-00177-gc9af03c14bfd 
> #7728
> Hardware name: Samsung Exynos (Flattened Device Tree)
> [<c0112710>] (unwind_backtrace) from [<c010e1f4>] (show_stack+0x10/0x14)
> [<c010e1f4>] (show_stack) from [<c0b5c50c>] (dump_stack+0xb4/0xe0)
> [<c0b5c50c>] (dump_stack) from [<c0625864>] 
> (device_wakeup_disarm_wake_irqs+0xdc/0x124)
> [<c0625864>] (device_wakeup_disarm_wake_irqs) from [<c0620b84>] 
> (dpm_resume_noirq+0x2b0/0x53c)
> [<c0620b84>] (dpm_resume_noirq) from [<c01983c8>] 
> (suspend_devices_and_enter+0x694/0xc74)
> [<c01983c8>] (suspend_devices_and_enter) from [<c0198da0>] 
> (pm_suspend+0x3f8/0x480)
> [<c0198da0>] (pm_suspend) from [<c019696c>] (state_store+0x6c/0xc8)
> [<c019696c>] (state_store) from [<c0356c78>] (kernfs_fop_write+0x10c/0x228)
> [<c0356c78>] (kernfs_fop_write) from [<c02b52c8>] (__vfs_write+0x30/0x1d0)
> [<c02b52c8>] (__vfs_write) from [<c02b8264>] (vfs_write+0xa4/0x180)
> [<c02b8264>] (vfs_write) from [<c02b84c0>] (ksys_write+0x60/0xd8)
> [<c02b84c0>] (ksys_write) from [<c0101000>] (ret_fast_syscall+0x0/0x28)
> Exception stack(0xeac89fa8 to 0xeac89ff0)
> 9fa0:                   00000004 0002b440 00000004 0002b440 00000004 
> 00000000
> 9fc0: 00000004 0002b440 000291b0 00000004 0002b440 00000004 be980bfc 
> 00028160
> 9fe0: 0000006c be980ac8 b6eae000 b6f0b634
> 
> I can help debugging this issue.
> 
> > ---
> >   kernel/rcu/Kconfig.debug | 11 +++--------
> >   1 file changed, 3 insertions(+), 8 deletions(-)
> >
> > diff --git a/kernel/rcu/Kconfig.debug b/kernel/rcu/Kconfig.debug
> > index 4aa02eee8f6c..ec4bb6c09048 100644
> > --- a/kernel/rcu/Kconfig.debug
> > +++ b/kernel/rcu/Kconfig.debug
> > @@ -9,15 +9,10 @@ config PROVE_RCU
> >   	def_bool PROVE_LOCKING
> >   
> >   config PROVE_RCU_LIST
> > -	bool "RCU list lockdep debugging"
> > -	depends on PROVE_RCU && RCU_EXPERT
> > -	default n
> > +	def_bool PROVE_RCU
> >   	help
> > -	  Enable RCU lockdep checking for list usages. By default it is
> > -	  turned off since there are several list RCU users that still
> > -	  need to be converted to pass a lockdep expression. To prevent
> > -	  false-positive splats, we keep it default disabled but once all
> > -	  users are converted, we can remove this config option.
> > +	  Enable RCU lockdep checking for list usages. It is default
> > +	  enabled with CONFIG_PROVE_RCU.
> >   
> >   config TORTURE_TEST
> >   	tristate
> 
> Best regards
> -- 
> Marek Szyprowski, PhD
> Samsung R&D Institute Poland
>
Madhuparna Bhowmik March 5, 2020, 5:39 p.m. UTC | #6
On Thu, Mar 05, 2020 at 07:52:38AM -0800, Guenter Roeck wrote:
> On Fri, Feb 28, 2020 at 02:54:51PM +0530, madhuparnabhowmik10@gmail.com wrote:
> > From: Madhuparna Bhowmik <madhuparnabhowmik10@gmail.com>
> > 
> > This patch default enables CONFIG_PROVE_RCU_LIST option with
> > CONFIG_PROVE_RCU for RCU list lockdep debugging.
> > 
> > With this change, RCU list lockdep debugging will be default
> > enabled in CONFIG_PROVE_RCU=y kernels.
> > 
> > Most of the RCU users (in core kernel/, drivers/, and net/
> > subsystem) have already been modified to include lockdep
> > expressions hence RCU list debugging can be enabled by
> > default.
> > 
> > However, there are still chances of enountering
> > false-positive lockdep splats because not everything is converted,
> > in case RCU list primitives are used in non-RCU read-side critical
> > section but under the protection of a lock. It would be okay to
> > have a few false-positives, as long as bugs are identified, since this
> > patch only affects debugging kernels.
> > 
> > Co-developed-by: Amol Grover <frextrite@gmail.com>
> > Signed-off-by: Amol Grover <frextrite@gmail.com>
> > Signed-off-by: Madhuparna Bhowmik <madhuparnabhowmik10@gmail.com>
> 
> Who is going to fix the fallout ?
> 
> fs/btrfs/block-group.c:2011 RCU-list traversed in non-reader section!!
> kernel/kprobes.c:329 RCU-list traversed in non-reader section!!
> net/ipv4/ipmr.c:136 RCU-list traversed in non-reader section!!
>
Hi,
There is already a patch for fixing the warnings in kernel/kprobes.c :
https://lore.kernel.org/lkml/157905963533.2268.4672153983131918123.stgit@devnote2/

Same for net/ipv4/ipmr:
https://lore.kernel.org/patchwork/patch/1198934/

Can you please send the warning with the stack backtrace and locks held
for btrfs/block-group.c, I will work on it.

Thank you,
Madhuparna
> This is just from my boot tests. I'll keep PROVE_RCU enabled for the
> time being, but unless the noise is addressed I'll have to disable it
> because otherwise the real problems disappear in the noise.
> 
> Guenter
Guenter Roeck March 5, 2020, 6:58 p.m. UTC | #7
On Thu, Mar 05, 2020 at 11:09:54PM +0530, Madhuparna Bhowmik wrote:
> On Thu, Mar 05, 2020 at 07:52:38AM -0800, Guenter Roeck wrote:
> > On Fri, Feb 28, 2020 at 02:54:51PM +0530, madhuparnabhowmik10@gmail.com wrote:
> > > From: Madhuparna Bhowmik <madhuparnabhowmik10@gmail.com>
> > > 
> > > This patch default enables CONFIG_PROVE_RCU_LIST option with
> > > CONFIG_PROVE_RCU for RCU list lockdep debugging.
> > > 
> > > With this change, RCU list lockdep debugging will be default
> > > enabled in CONFIG_PROVE_RCU=y kernels.
> > > 
> > > Most of the RCU users (in core kernel/, drivers/, and net/
> > > subsystem) have already been modified to include lockdep
> > > expressions hence RCU list debugging can be enabled by
> > > default.
> > > 
> > > However, there are still chances of enountering
> > > false-positive lockdep splats because not everything is converted,
> > > in case RCU list primitives are used in non-RCU read-side critical
> > > section but under the protection of a lock. It would be okay to
> > > have a few false-positives, as long as bugs are identified, since this
> > > patch only affects debugging kernels.
> > > 
> > > Co-developed-by: Amol Grover <frextrite@gmail.com>
> > > Signed-off-by: Amol Grover <frextrite@gmail.com>
> > > Signed-off-by: Madhuparna Bhowmik <madhuparnabhowmik10@gmail.com>
> > 
> > Who is going to fix the fallout ?
> > 
> > fs/btrfs/block-group.c:2011 RCU-list traversed in non-reader section!!
> > kernel/kprobes.c:329 RCU-list traversed in non-reader section!!
> > net/ipv4/ipmr.c:136 RCU-list traversed in non-reader section!!
> >
> Hi,
> There is already a patch for fixing the warnings in kernel/kprobes.c :
> https://lore.kernel.org/lkml/157905963533.2268.4672153983131918123.stgit@devnote2/
> 
> Same for net/ipv4/ipmr:
> https://lore.kernel.org/patchwork/patch/1198934/
> 
> Can you please send the warning with the stack backtrace and locks held
> for btrfs/block-group.c, I will work on it.
> 

See below. I think that should be easy to reproduce by mounting
a btrfs file system.

Guenter

---
[   28.920119] BTRFS: device fsid afe7540f-98fe-4a5c-ba94-3fb85a5da345 devid 1 transid 6 /dev/root scanned by swapper/0 (1)
[   28.961347] BTRFS info (device sda): disk space caching is enabled
[   28.963199] BTRFS info (device sda): has skinny extents
[   28.963427] BTRFS info (device sda): flagging fs with big metadata feature
[   29.104392]
[   29.104591] =============================
[   29.104756] WARNING: suspicious RCU usage
[   29.105046] 5.6.0-rc4-next-20200305 #1 Not tainted
[   29.105231] -----------------------------
[   29.105401] fs/btrfs/block-group.c:2011 RCU-list traversed in non-reader section!!
[   29.105643]
[   29.105643] other info that might help us debug this:
[   29.105643]
[   29.106344]
[   29.106344] rcu_scheduler_active = 2, debug_locks = 1
[   29.106590] 1 lock held by swapper/0/1:
[   29.106776]  #0: ffff0000199e90d8 (&type->s_umount_key#23/1){+.+.}, at: alloc_super+0xac/0x2c0
[   29.107436]
[   29.107436] stack backtrace:
[   29.107784] CPU: 3 PID: 1 Comm: swapper/0 Not tainted 5.6.0-rc4-next-20200305 #1
[   29.107989] Hardware name: linux,dummy-virt (DT)
[   29.108344] Call trace:
[   29.108488]  dump_backtrace+0x0/0x1a0
[   29.108638]  show_stack+0x14/0x20
[   29.108784]  dump_stack+0xe8/0x150
[   29.108921]  lockdep_rcu_suspicious+0xf8/0x108
[   29.109071]  btrfs_read_block_groups+0x754/0x860
[   29.109222]  open_ctree+0xe74/0x1580
[   29.109359]  btrfs_mount_root+0x3cc/0x4c0
[   29.109514]  legacy_get_tree+0x2c/0x60
[   29.109653]  vfs_get_tree+0x24/0xe8
[   29.109872]  fc_mount+0x14/0x50
[   29.110010]  vfs_kern_mount.part.41+0x68/0x98
[   29.110158]  vfs_kern_mount+0x10/0x20
[   29.110297]  btrfs_mount+0x158/0x4b0
[   29.110434]  legacy_get_tree+0x2c/0x60
[   29.110573]  vfs_get_tree+0x24/0xe8
[   29.110709]  do_mount+0x568/0x998
[   29.110849]  do_mount_root+0x8c/0x11c
[   29.110990]  mount_block_root+0x114/0x244
[   29.111134]  mount_root+0x124/0x154
[   29.111272]  prepare_namespace+0x128/0x164
[   29.111417]  kernel_init_freeable+0x298/0x2b8
[   29.111569]  kernel_init+0x10/0x100
[   29.111710]  ret_from_fork+0x10/0x18

Patch
diff mbox series

diff --git a/kernel/rcu/Kconfig.debug b/kernel/rcu/Kconfig.debug
index 4aa02eee8f6c..ec4bb6c09048 100644
--- a/kernel/rcu/Kconfig.debug
+++ b/kernel/rcu/Kconfig.debug
@@ -9,15 +9,10 @@  config PROVE_RCU
 	def_bool PROVE_LOCKING
 
 config PROVE_RCU_LIST
-	bool "RCU list lockdep debugging"
-	depends on PROVE_RCU && RCU_EXPERT
-	default n
+	def_bool PROVE_RCU
 	help
-	  Enable RCU lockdep checking for list usages. By default it is
-	  turned off since there are several list RCU users that still
-	  need to be converted to pass a lockdep expression. To prevent
-	  false-positive splats, we keep it default disabled but once all
-	  users are converted, we can remove this config option.
+	  Enable RCU lockdep checking for list usages. It is default
+	  enabled with CONFIG_PROVE_RCU.
 
 config TORTURE_TEST
 	tristate